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Chapter 


Cognitive Science and 
Graphic Design 


Leland Wilkinson 


This chapter has nothing to do with SYSTAT. It has everything to do with designing 
good graphs. 


The Function of Quantitative Graphics 


Graphics can have many functions. They can entertain. They can persuade. The 
function of quantitative graphics, however, is to inform. By presenting graphics to 
others, you are attempting to communicate information through a wide and complex 
channel: the human visual system. In communicating this information, you can 
entertain or persuade or do other things, but if you distort the information underlying 
your graphics, you have failed. 

Many designers of quantitative graphics confuse these functions or subordinate the 
goal of informing to other goals. Sometimes this is intentional, as in graphic 
propaganda, but often it is inadvertent, as in popular newspaper graphs that distort 
their message with bright colors and “perspective” views. 

If you think of quantitative graphics as a mode of information processing, then you 
can use the tools of cognitive psychology to evaluate displays. This chapter will cover 
the basics of visual information processing and the principles of graphic design, and 
then conclude with examples of good and bad graphs. If you want to read more in this 
area, refer to Frisby (1980); Haber and Hershenson (1980); Levine and Shefner 
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(1981); Spoehr and Lehmkuhle (1982); Тийе (1983); Chambers, Cleveland, Kleiner, 
and Tukey (1983); and Cleveland (1985 and 1993). 


Visual Information Processing 


The visual system can be represented by several abstract components. Figure 1-1 
shows these components schematically. The graphic image is a composite of physical 
aspects that contain the information you are communicating. The image may originate 
on paper, on a computer screen, or in another medium. For our purposes, the image is 
the set of critical features that stimulate the retina to fire its neurons such that the 
remainder of the visual system in the brain can process the information. 

Iconic memory is the first component of memorization. Cells in the retina fire 
neural impulses when they absorb a quantity of photons from a light stimulus. They 
continue to fire for a short time after the stimulation ceases and thus serve as a brief 
store for the image itself. 

Short-term memory holds essential features of the stimulus so that it can be 
integrated into the framework of long-term memory. A famous study by George Miller 
(1956) and later studies by others have indicated that short-term memory can hold at 
least four and perhaps as many as seven or eight "chunks" of information. These 
chunks can be made up of other chunks from long-term memory, so that short-term 
memory can be used to build up arbitrarily complex constructs. This is why two arrows 
are used between short- and long-term memory to indicate feedback. Some 
psychologists working in verbal learning claim that short-term memory is acoustical, 
meaning that information is rehearsed subvocally until it can be integrated into long- 
term memory. Others, such as Shepard (1978) and Kosslyn (1980), have shown that 
visual perceptual units can be stored in short-term memory as well. In either case, 
perceptual chunking in short-term memory allows time for long-term memory 
encoding to occur. This process takes about 20 seconds or so for each chunk of 
information. 

Long-term memory contains the permanently remembered information from the 
perceived graph. “Permanently” is used advisedly because there is no compelling 
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evidence to show that one ever forgets anything once it is encoded into long-term 
memory (assuming no physiological damage from toxins such as alcohol or physical 
deterioration associated with aging). Forgetting is more likely a failure to draw 
connections between associated information stored in memory. “Forgotten” 
information can often be recovered by a careful reconstruction of associated 
information, experiences, and sensations. 


Figure 1-1 
Abstract representation of the visual system 


Image Feature Schema 
„ж UN 
Graphic Iconic Short Term Long Term 
Image Memory Memory Memory 


Above the diagram are the components of the information that are processed at each 
stage. Information passed from the graphic image to iconic memory depends on the 
optical quality of the image. If a graph has poor contrast and fuzzy lines, for example, 
critical aspects will not register in the iconic store. Knowing this, you should attempt 
to keep graphic images clean, with high contrast and crisp delineation. If you use 
colors, you should avoid faint pastels, muddy tones, and other low-intensity shades. 

Features of an image are transmitted from the iconic store to short-term memory. If 
a graph has too many features (for example, 15 curves), the information cannot be held 
in short-term memory after a few seconds of looking at the graph. Knowing this, you 
should limit the essential features in one graph to a manageable number, unless you 
expect your viewers to spend considerable time processing the information. 

Finally, information in short-term memory is integrated into long-term memory via 
schemas. A “network” symbol is used to represent long-term memory because current 
theories frequently use this descriptive structure. Schemas are networks of associations 
that integrate information. If you abstract a graph of sales over years, for example, you 
might recall that sales increased in a straight line over the years involved. You might 
remember this straight line by associating it with a verbal description of the formula 
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describing sales from years (some mathematicians and actuaries might remember a 
graph this way). Or, you might keep a visual image of the slope of the line relative to 
the frame around the graph and associate this image with remembered values of the 
axes (for example, “millions of sales in the 1960's"). 

Psychologists disagree on exactly how information is stored in long-term memory. 
There is evidence that information can be stored as a set of linked propositions and 
other evidence that it can be stored as linked icons, or visual mental representations. It 
may be stored both ways. You do not need to resolve this controversy to decide how 
to use graphic designs, however. In either case, you should realize that the information 
ina graph will be stored most effectively if it can be associated with other information. 
Unusual scales, break points in graphs, or puzzling anchor points for data values may 
all interfere with the process of storing the fundamental information in a graph—for 
example, the change in one variable relative to another. 


The Psychophysics of Perception 


You have seen a representation of the path from graphic image to long-term memory. 
While this structure has implications for the design of good graphs, you still need to 
understand how images are perceived. The visual system, like our other sensory 
systems, encodes information in various forms. Perhaps because quantitative graphs 
are relatively recent visual icons in human evolutionary history, you process them with 
the same tools you use to perceive three-dimensional scenes and two-dimensional 
pictures. Sometimes these tools cause you to distort the very information you are 
attempting to perceive accurately. 


The Power Law 


Early in the nineteenth century, Weber noticed that the increment in stimulation 
required to produce a just noticeable difference between two stimuli was proportional 
to the size ofthe stimuli: the bigger (more intense) the stimuli, the bigger the difference 
needed to notice a difference between the two. For example, you can easily see two 
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objects separated on your desk. You have more trouble discriminating between two 
objects a quarter mile from your desk if they are separated by the same distance. 

Not long after Weber's discovery, Fechner derived a scale for sensation. Assuming 
that Weber's just noticeable differences in sensation were equivalent at all levels of 
stimulation, Fechner computed a logarithmic function relating the magnitude of 
sensation (S) to the intensity of stimulation (/). 

In Fechner's function, sensation increases logarithmically with stimulation, so 
differences in sensation are produced by the same ratios of stimulation: 

S = klog(/) 

In the 1950's, Stevens (following the work of Plateau in the 180075) proposed a 
power function for sensation instead of Fechner's logarithmic curve. Using a wide 
variety of stimuli, Stevens fit his data with a power function: 

S = Шр 
There is still some controversy over whether Stevens' and Fechner's curves describe 
perceptual data and even the possibility that both are correct under different 
experimental conditions. Figure 1-2 shows both curves for a typical application. The 
important thing to notice is the downward curvature of both. In practical terms, 
increasing the size of symbols on a graph will not increase the perceived size in the 
same increments. Increasing the darkness of a filled area will not increase the 
perceived darkness in the same increments. This downward bias should make you 
wary of using area, darkness, and volume in graphs when you have other methods for 
representing quantitative variation that are less susceptible to these distortions. 

Stevens and his associates measured a wide variety of auditory, visual, and tactile 


stimuli. For your purposes, it is most useful to note that the value of the exponent in 
the power function varies across types of graphical stimuli. For length judgments, it 
can range from 0.9 to 1.1; for area, from 0.6 to 0.9; and for volume, from 0.5 to 0.8 


(Baird, 1970; Cleveland, 1985). 
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Figure 1-2 
Power and log curves relating stimulus intensity and magnitude of sensation 
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Circle areas and densities follow power law 
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Figure 1-3 illustrates this perceptual bias. In which pair ofcircles (A, B, or C) is the right 
circle twice the area of the left? What about the densities on the right in Е igure 1-3 ? 
Which is twice as dark as its partner? 

Each pair was drawn using a different exponent in the Stevens power function to 
modify what would otherwise be twice the area of the circle on the left or twice the 


to 
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density of the rectangle on the left. Pair A has an exponent of 1.0, pair B has 0.95, and 
pair C has 0.90. The answer, then, is that the right circle in pair A is twice the area of 
the circle on the left, and the right rectangle in pair A is twice as dark as the one on the 
left. 

These examples should alert you to the dangers of using area and shading to 
represent numerical quantities. One solution might be to adjust areas, shadings, and 
other features in a graph to fit the psychometric functions derived from perceptual 
experiments. This is easy to do in a computerized graphing package. However, there 
are usually alternative ways to represent quantitative variation without resorting to 
shading, area, or other features governed by exceptionally flat psychometric functions. 


Visual Illusions 


PICTURE: a representation in two dimensions of something wearisome in three. 


—Ambrose Bierce, The Devil 5 Dictionary 


Pictures have a dual reality (Haber and Hershenson, 1980). We live in a three- 
dimensional world in which pictures are two-dimensional, yet pictures can represent 
three-dimensional objects. Consequently, our perception of graphs (pictures) is 
influenced by the tools we have for perceiving three-dimensional space. Sometimes, 
these tools interfere with accurate perception of a graph. 

Figure 1-4 shows some well known two-dimensional illusions. The first (A) isa 
horizontal-vertical illusion in which two equal line segments are distorted by relative 
orientation. The second (B) is the Muller-Lyer illusion, in which equal line segments 
are distorted by intersecting angles. C is the Poggendorf illusion, in which the diagonal 
segments lie on a common line but are displaced by the verticals. D is a Delboeuf 
figure, in which the sizes of the center circles are equal but distorted by their surrounds. 
Finally, E is a Ponzo illusion, in which the perceived sizes of two equal circles are 
distorted by the surrounding perspective angle. Coren and Girgus (1978) document 


many other illusions. 
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Figure 1-4 
Visual illusions 
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Gregory (1969) and others believe that many of these illusions evoke three- 
dimensional depth cues that are inappropriately applied in two-dimensional contexts. 
The Muller-Lyer and Ponzo illusions, for example, distort size judgments by 
surrounding stimuli with angular depth cues. The Delboeuf figure may involve 
"tunnel" cues often used in three-dimensional processing. These features make it 
difficult to judge absolute size in two dimensions because we are accustomed to using 
depth cues for three-dimensional size judgments. 

Whatever the explanation for these illusions, keep in mind that judgments involving 
angles and figure-ground relations (such as in illusion D) in graphs often are biased. If 
you can find alternatives to angle representations, such as parallel straight line 
segments, you will often be more successful in communicating information accurately. 


Gestalt Psychology and Figure-Ground Separation 


Early in this century, Gestalt psychologists proposed that "the whole is more than the 
sum of its parts." In graphical terms, this means that elements in a graph look different 
when viewed alone than when viewed in the context of the entire graph. The Gestalt 
psychologists showed, for example, that when objects are placed near each other, they 
аге perceived as part of an integral pattern. Furthermore, similar objects in an overall 
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display tend to be perceived as part of a unified pattern. Other features of objects, such 
as symmetry and continuity, affect how we perceive them when embedded in more 
general patterns. This perceptual organization is not always inherent to the retinal 
image—we impose organization on the image in order to process it. 

The figure-ground effect is a closely related phenomenon. Objects can be framed or 
placed against a background in ways that change their appearance. An object that 
contrasts with its background, for example, tends to look more integrated than one that 
does not. Cleveland, Diaconis, and McGill (1982) presented an interesting example of 
this effect. They showed that when point clouds of scatterplots are surrounded by 
larger or smaller frames, people's perceptions of the correlation between the 
represented variables changed. ; 

Figure 1-5 shows an illustration of this effect using data generated and plotted in 
SYSTAT. The data are identical in both plots. 
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Figure 1-5 
The top scatterplot looks more correlated 


The Perception of Color 


Color is one of the most popular media in computer graphics. Unfortunately, it is also 
one of the most difficult to use effectively. We want graphs to look pretty, and so we 
choose color to represent scales or categories. In doing so, we often overlook the 


complexities involved in perceiving color. 
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Color is not a physical characteristic of objects or light. It is a purely psychological 
phenomenon, a “fabrication of the visual system” (Levine and Shefner, 1981). The 
colors we see are the summation of stimulation by light photons of three different 
pigments in our retina. The firings of neurons associated with these different pigments 
are integrated in the visual system to construct every color we see. 

Because perceived color is a summation of stimulation, the same perceived color 
can be produced by an infinite number of different physical characteristics in an object 
and/or light source. 

Any three different wavelengths of light can be added (or subtracted) in different 
quantities to produce the entire visible spectrum, but wavelengths corresponding 
roughly to RED, GREEN, and BLUE are used in our visual system. For similar operating 
reasons, color computer terminals and televisions mix the same basic colors. 

Most of us were introduced to color theory by holding up a prism to see Newton's 
spectrum, which appears to be a linear ordering from short (deep violet) to long (deep 
red) wavelengths. Some computer displays use this spectrum to represent dimensions 
(for example, COOL-WARM temperature or LOW-HIGH altitude). Because of the way 
our visual system sums wavelengths, however, we do not perceive the spectrum 
linearly. We perceive it as an open circle or horseshoe, with deep red and deep violet 
at each end of the opening and green at the opposite closed portion. Deep red appears 
closer to deep violet than to green, for example. You can see a scaling of this circle in 
SYSTAT Statistics. Thus, if you want to use color to represent a linear ordering, you 
should probably choose a segment of the spectrum, say, from green to red. 

Another complication affects the use of color in graphics. A spectral color ofa given 
hue can be mixed with white light to make it appear pale. Mixture affects not only the 
saturation of a color, but its energy or brightness as well. Pure spectral colors do not 
appear equally vivid or bright. SYSTAT controls approximately the brightness and 
saturation of colors on most graphics devices when you choose to manipulate hue. 

Colors are best used to represent categories instead of scales. You might use red 
symbols for an experimental group and green for a control group, for example. 
Perception of color categories is innate, cross-cultural, and not dependent on language. 
That Eskimos have more words to describe the color of snow does not mean that they 


perceive more colors in an arctic landscape. Infants, for example, show clear 
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boundaries between colors (Bornstein, Kessen, and Weiskopf, 1976). When using 
colors for category definition in a graph, it is a good idea to choose contrasting colors 
(for example, RED-GREEN, or RED-YELLOW-GREEN-BLUE) to enhance these boundary 
discriminations. 

Colors can create visual illusions. A gray patch against a green background will 
appear reddish, for example. You may have noticed a similar contrast effect after 
working at a green computer screen. The world looks pink when you look away from 
the screen. Colors also affect area judgments. In three dimensions, a blue disk appears 
to be farther away than a red disk of the same size, controlling for saturation and 
brightness. In two dimensions, a red area will look larger than a blue area, probably 
because of a three-dimensional illusion. Cleveland and McGill (1983), for example, 
found that people judged red areas on maps to be larger than blue areas. Durrett (1987) 
contains several informative papers on the use of color for computer graphics. 


Graphic Design 


You can apply psychological principles to the design of graphs, and you can 
supplement them with aesthetic principles. Cleveland (1985) integrated both areas in 
a landmark book. After a survey of statistical and psychological research, including 
some of his own, Cleveland derived an approximate ordering of graphical features 
from most to least accurate in representing quantitative variation. Figure 1-6 presents 
this hierarchy. 

The criterion for constructing this hierarchy is the linear agreement between the 
quantitative information presented graphically to subjects and the actual values 
underlying the graphical representation. In a variety of experiments, tasks involving 
modes higher up in the hierarchy were performed more accurately than tasks lower in 
the hierarchy. Thus, all other things being equal, you should use a bar chart rather than 
a pie chart for presenting comparative information; a bar chart provides a common 
scale, and a pie chart involves angle judgments. Cleveland's hierarchy is somewhat 
oversimplified. Simkin and Hastie (1987), for example, have found exceptions to this 
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rule when proportional judgments are involved. Nevertheless, Cleveland's basic 
hierarchy has proved useful in practice. 

Sometimes you have no choice. Time series plots, scatterplots, and mathematical 
functions often require angle judgments because slope is intrinsically angular relative 
to a horizontal or vertical orientation. In these cases, experimental evidence indicates 
that it is important to choose scales that make the physical slope of the graphed 
function as close to 45 degrees as possible (Cleveland and McGill, 1988). 

Bertin (1983) and Tufte (1983) have written about graphics more from a design 
point of view. Both stress economy and simple graphic icons. Although both speak of 
maximizing the information in a graph, you should qualify this rule with what you 
know about the visual system. In graphs intended for a glance, such as in slide 
presentations, Mies van der Rohe's dictum “less is more” is a better rule. If a graph 
contains too many visual modes, its information is unlikely to make its way into long- 
term memory. On the other hand, if you are presenting graphs in a publication, you can 
tolerate a high degree of complexity—provided that components of the graph can be 
processed in "chunks" to make their way from short-term to long-term memory. By 
now, you should know that there is no simple rule for distinguishing good graphics 
from bad. The appropriateness of a graph depends on the conditions in which it is 
presented and the information to be communicated. 
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Figure 1-6 
Cleveland graphic elements hierarchy 
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Some Examples 


A few examples should illustrate the psychological and design principles you have 
seen. The following figures contrast two alternative graphs of the same information. 
The upper graphs are less effective in communicating the information than the lower. 


Perspective Graphs 


Figure 1-7 illustrates a three-dimensional bar graph and a line graph. These 
perspective bar charts are popular in business programs. Like all perspective plots, the 
depth information is confusing and gives rise to several visual illusions. The actual 
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height of the bars is difficult to establish. The plot can be ruined further if color coding 
is added to the bars, which gives rise to perspective illusions. 

The lower graphs in the figure are less glamorous but more effective. If you want to 
compare trends between the two grouping variables, the line graph is particularly 
useful. It is easy to see where the profiles are parallel, and their heights and values are 
easily identified on the common scale. If you are more interested in highlighting 
differences at each comparison point, the multivalue bar chart is more suitable. Here, 
the graphic focus is on each pair of bars, facilitating individual comparisons. Finally, 
you should consider a dot plot (Cleveland, 1985). Dot plots are similar to bar graphs, 
but they do not connect the data points to a base the way bars do. 


Figure 1-7 
Three-dimensional bars versus lines 
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Pseudo-Perspective Bar Charts 


A chart that is similar to the perspective bar chart is the pseudo-perspective bar chart. 
Illustrators frequently feel the need to make two dimensional bars look like blocks or 
skyscrapers. Doing so makes it difficult to reference the top of the bar against a scale. 
It is never clear whether the “front” or “back” of the bar is intended to be the height 
indicator. Figure 1-8 shows an example of this type of graph. The figure on the left is 
a double bar graph with pseudo-perspective to enhance the display. The same 
information is contained in the graph on the right, but it is less cluttered, more 
informative, and aesthetically more pleasing. As with Figure 1-7, you could represent 
this information with a simple line graph, especially if parallelism of the profiles was 
your primary interest. 


Figure 1-8 
Pseudo-perspective bars versus two-dimensional bars 


кА! ba» 


iil LI | | | 


Pseudo-Perspective Line Graphs | 


The graph on the left in Figure 1-9 was adapted from a chart of grain production in 
China and the Soviet Union featured in a leading national newspaper some years ago. 
The point of the article was to highlight the widening gap between Soviet and Chinese 
grain production. Although the graph shows production on the vertical axis against 
years on the horizontal axis, it does little to make the point. First, the three-dimensional 
perspective makes it difficult to line up the two trends. Shifting the lower trend to the 
left to simulate perspective ruins the calibration of the horizontal scale. Second, the 
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uneven shading across the graph enhances our depth perception, but it ruins our focus 
on the widening gap, which is the purpose of the graph and article. 

The graph on the right represents the same data in a simple, two-dimensional profile 
chart. The fill area is dark enough to contrast strongly with the background, yet values 
can still be read against the gridlines. 


Figure 1-9 
Pseudo-perspective line graph versus filled line graph 


Perspective Pie Charts 


Pie charts are among the most abused graphics icons. A favorite among business 
packages is the three-dimensional pie chart. These charts frequently appear in 
newspapers, TV graphics, and textbooks. They incorporate nearly every visual illusion 
discussed in this chapter. Three-dimensional versus two-dimensional pie chart shows 
an example of a 3-D versus a regular pie chart with the same information. The upper 
figure includes some of the texturing which is popular in these displays and which 
further distorts the proportional area information. The shading on the side of the pie 
makes the area judgments even more difficult. Finally, removing the slice impedes 
anchoring judgments, in which you must mentally superimpose one slice on another in 
order to compare their magnitudes. Pulling slices out of 3-D or 2-D pies is never as 
effective as shading or coloring the slice in its proper place in the pie. Coloring and 
shading pies can enhance their attractiveness, but if you are interested in accurate 
judgments, keep them empty. Both shading and coloring interfere with size judgments. 
You may want to try the upper figure on several friends. Cover the lower pie and 
ask them to judge the area of the slice as a percentage of the total area. Then ask them 
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to judge the same slice in the lower pie. The answer for both is just under 30 percent. 
People tend to underestimate the 3-D slice. 


Figure 1-10 
Three-dimensional versus two-dimensional pie chart 
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Information Overload 


You can put too much information in a graph intended for a glance. Tufte (1983) and 
Bertin (1983) recommend a high ratio of “data” to “ink” in order to discourage 
distracting irrelevant features, but this principle can backfire if pushed to the extreme. 
Figure 1-11 is a graph adapted from an advertisement that shows all the bells and 
whistles of a new computer graphics package. The graph was being used in a slide 
presentation. A composite graph of this sort is like “integrated” software—the pieces 
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work tolerably but the whole is intimidating. In trying to cram too much information 
into a single panel of a display, the designer of this graph compromised the individual 
choices. An effective alternative to this graph might include several independent 
graphs. 


Figure 1-11 
An excessively complicated graph 


There are times when a complex graph can be appropriate—even in a slide 
presentation. A previous section in this chapter mentioned that you can "chunk" 
complex information in short-term memory if you can find simple rules and analogies 
for processing it. Memory experts do this when they memorize thousands of digits of 
random numbers. For graphs, the best way to facilitate chunking is to integrate the 


components in an ordered arrangement. See Chapter 6 for examples of complex graphs 
that can nevertheless be memorized after careful viewing. 
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Conclusion 


Many of the *bad" examples in this chapter add depth to an essentially 2D graph. This 
is the style preferred by most newspapers and businesses. There is nothing wrong with 
embellishing graphs to catch people's attention, but you should not deceive yourselves 
that these decorations facilitate communication. It also demeans your viewers if you 
assume they will not take the time to view your data thoughtfully. As with writing, keep 
your graphs simple, and your audience will get your message. 
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Bar, Dot, Line, Profile, 
Pyramid, and Pie Charts 


Leland Wilkinson 


SYSTAT offers six univariate graphical displays useful for characterizing the values 
ofa single variable for a complete sample or for subpopulations defined by one or two 
stratifying variables. For each category of a variable or cross-classification of two 


stratifying variables: 


Bar displays a bar 

Dot displays a dot (or other plot symbol) where the top of the bar would be 

Line displays a line connecting the points where the dots or tops of bars would be 
Profile fills in the area under the line 

Pyramid draws pyramids instead of bars 

Pie Chart displays the proportion of counts or measures (described below) falling 
within the category 


In the above chart types, the height or size of the display element represents: 


The count in that category. You specify one or two numeric or string variables 
with a few distinct categories, and SYSTAT counts the number of cases in each 


category (or cross-classification). 

The mean or median of the values in that category. You specify a quantitative 
variable and one or two stratifying variables with numeric or string codes. 
SYSTAT determines the median or average of the values of the quantitative 
variable for each category of one stratifying variable (or the cross-classification 
of two variables). 

The percentage of cases in that category. You select the Display values as a 
percentage of sum option. If you specify a categorical variable only, SYSTAT 
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tallies the number of cases that have each unique value, sums the tallies, and 
computes the percentage each tally is of the total. If you specify a quantitative 
variable and a stratifying variable, SYSTAT computes the average value of the 
quantitative variable within each category, sums these means, and determines the 
percentage of each mean in the total. 


W Ameasurement or statistic input for the category. Here you input a record for each 
category with the statistic (for example, total sales for a region and maximum side 
effect score for a treatment) and the value of the stratifying variabli 

Alternative displays include multivariable bar charts, percentage bar charts, range bar 

charts, stacked bar charts, divided bar charts, anchored bar charts, multiplots, star plots, 

and attention maps. 

When a stratifying variable has two levels, a Mirror, or Dual, display is available. 
Real and pseudo (by adding depth to a 2-D plot) 3-D graphs can be requested with all 
displays except Pie Chart. (Pie Chart does offer a pseudo 3-D display, however.) You 
can interactively rotate the 3-D displays using the Dynamic Explorer. 

You can include several variables in one display with their respective bars laid out 
side-by-side (or stacked on top of one another) within each category, or, for repeated 
measures designs, the bars for all categories for the first variable positioned before 
those for subsequent variables. For either structure, results can be stratified— that is, 
displayed in separate frames. 

The data for these displays can be in cases-by-variables form (multiple cases per 
category) or aggregated by category with the count, mean, or another measure and 
category identifier. 

In addition, SYSTAT offers a conversion from any one of these displays (except for 
Pie charts) to any ofthe other displays. After creating a chart, you can change the Chart 
Type from Options tab of Frame group of the Graph Properties Dialog box which is 
available on double clicking in the Graph Editor. Pie chart can be transformed to 
corresponding Attention map (Ring) from Options tab of Element group of the Graph 

Properties Dialog box. 


Bar, Dot, Line, Profile, Pyramid, and Pie Charts 


Sample Displays 
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Bar Chart Dialog Box 


Bar produces a variety of two- and three-dimensional bar charts that display counts or 
means within categories. 


To open the Bar Chart dialog box, from the menus choose: 


Graph 
Bar Chart... 


MM Graph:Bar Chart 
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X, Y, and Z variable(s). The structure of the resulting chart depends on the variables 
that you select. For example, select a categorical x variable and a y variable to chart the 
mean of the y variable within each category of the x variable, or add a z variable for a 


three-dimensional chart. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side-by-side in separate frames or overlaid in a single 
display. 

Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


Multiplot. Creates a table of two-dimensional bar charts with grouping variable 
categories appearing along the top and left of the table and X-variable and Y-variable 
categories appearing along the right and bottom. This orientation is similar to a Trellis 
display. 

Repeated trials. Display means for two or more variables identified as repeated trials. 
The Grouping variable(s) option allows the means to be broken down by two between 
subjects’ factors. Specify multiple y variables and select Repeated trials to display 
repeated measures. 


Counts of Y*X. Creates a temporary z variable that acts as a crosstabulation of x and y, 
to produce a three-dimensional display of the number of cases in each combination of 
x and y variables. 


Matrix columns. Creates a display where x is the variable index, y is the case index, 
and z is the data value. Select one or more z variables without selecting an x or y 
variable to enable this option. 


Display as. The default display in the presence of the z-variable is 3-D. Selecting 2-D 
Mosaic collapses a 3-D display into two dimensions. 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side-by-side. Different colors, symbols, or patterns distinguish 
separate plots or subpopulations. 

Stack bars of multiple variables. If you select multiple y or z variables, select this 


option to stack bars in a single frame rather than create side-by-side charts. 


Range between two variables. Plot the interval between two quantitative variables 
against a categorical variable. Use two variables—such as high temperature and low 
temperature—to define a low and high value. 
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In addition, you can include error bars, change coordinate systems, and customize the 
layout, axes, and appearance of the charts. 


Bar Options 


"ll Graph:Bar Chart 
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The following options are available when creating a bar chart: 


Display the values as a percentage of the sum. Creates a chart that shows values as 
a percentage of the sum. For example, instead of showing how many people in your 
data set attained a certain level of education, a bar chart shows what percentage of 
them reached those levels. 

Display the median instead of the mean for height. For two-dimensional charts 
involving a y-variable or three-dimensional charts involving a z-variable, select 
this option to display the median of the y or z variable instead of the mean. 

Add pseudo 3-D depth. Adds a 3-D effect to a two-dimensional chart. 

Anchor the bars to Y. Use this option to compare a variable against a standard 
level. The bars reach up or down relative to the value you enter. For example, you 
could chart the amount of rainfall in several regions and specify the mean as an 
anchor to emphasize the areas with the most extreme precipitation conditions. 
Width of bars. Changes the thickness of the bars. Setting the width to 0.90 
produces extremely wide bars, while setting it to 0.10 would produce very thin 
bars. The default is 0.50. 

Label the bars. Displays the exact value represented by each bar. 


W Label size. Increase or decrease the font size for the label. 


Dot Chart Dialog Box 


Dot produces a variety of two-and three-dimensional charts showing dots to represent 
counts or means within categories. 


To open the Dot Chart dialog box, from the menus choose: 


Graph 
Summary Charts 


Dot ... 
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Ў Graph:Summary Charts:Dot 


Leu | le aa | fa 1 Spit Г trese | 
Man | Opti Enot Bars | Coordinates | XAxis | үлке | 22 | 


Available Available vanable(s) — X variable(s]- 
COUNTRY$ [ «Required ] 


me B 
POP. 1983 
POP. 1986 
POP. 1890 
POP. 2020 
URBAN 
BIRTH. 82 


BIRTH. RT Add -> 
DEATH 82 
DEATH RT 
BABYMT82 
BABYMORT 
LIFE, EXP 
GNP. 82 AA " 

GNP. 86 Matrix columns 
GDP_CAP 
LOG_GDP 
EDUC_84 
EDUC n 
HEALTH84 | Add 
HEALTH * Mirror (Dual) 
| MLB gy ees МиР! 


<- Remove 


Overlay multiple graphs into a sir gle frame 


6149 


X, Y, and Z variable(s). The structure of the resulting chart depends on the variables 
that you select. For example, select a categorical x variable and a y variable to chart the 
mean of the y variable within each category of the x variable, or add a z variable for a 
three-dimensional chart. 

Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side-by-side in separate frames or overlaid in a single 
display. 
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Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


Multiplot. Creates a table of two-dimensional dot charts with grouping variable 
categories appearing along the top and left of the table and X-variable and Y-variable 
categories appearing along the right and bottom. This orientation is similar to a Trellis 
display. 


Repeated trials. Display means for two or more variables identified as repeated trials. 
The Grouping variable(s) option allows the means to be broken down by two between 
subjects" factors. Specify multiple y variables and select Repeated trials to display 
repeated measures. 


Counts of Y*X. Creates a temporary z variable that acts as crosstabulation of x and » 
to produce a three-dimensional display of the number of cases in each combination of 
x and y variables. 


Matrix columns. Creates a display where x is the variable index, y is the case index, 
and z is the data value. Select one or more z variables without selecting an x or y 
variable to enable this option. 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side-by-side. Different colors, symbols, or patterns distinguish 
separate plots or subpopulations. 


In addition, you can include error bars, change coordinate systems, and customize the 
layout, axes, and appearance of the charts. 
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Dot Options 


@ Graph:Summary Charts:Dot 


| Men ] Options | Епо Ваг | Coordinates | X:Axis 


Tug mem nem ттт nm anum amem 
eS тч TUS TUS ЕЛ 


[Г] Display the values as a percentage of the sum 
[Г] Display the median instead of the mean for height 
[Г] Ада pseudo 3-D depth 


Anchor the bars to 


[Line connected in left-to-right order 


The following options are available when creating a dot chart: 


Display the values as a percentage of the sum. Instead of counts, SYSTAT plots 
values as a percentage of the sum. 


Display the median instead of the mean for height. For two-dimensional charts 
involving a y-variable or three-dimensional charts involving a z-variable, select 
this option to display the median of the y or z variable instead of the mean. 


Add pseudo 3-D depth. Adds a 3-D effect to a two-dimensional chart. 
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= Line connected in left-to-right order. Connects the dots with a line, in a sequence 
from left to right. 


Line Chart Dialog Box 


Line produces a variety of two- and three-dimensional charts showing lines to 


represent counts or means within categories. 


To open the Line Chart dialog box, from the menus choose: 
Graph 
Line Chart... 


Ж Graph:Line Chart 


[ae | INI a NET 
Man | Opi Епо Ваз | Tee [ хак | YAris | 27 

Available valable vanable(s) — Xevariable(s]: А 

“COUNTRYS XXS [ <Required> 


COUNTRYS Ш) 
POP. 1983 
POP. 1366 = 
POPL 1990 «- Hemove 
POP. 2000 


URBAN 
BIRTH. 82 
BIRTH. RT 
DEATH 82 
DEATH. RT 
BABYMT82 
BABYMORT 
LIFE EXP 
GNP 82 
GNP. 86 
GDP. CAP 
LOG GDP 
EDUC 84 
EDUC 
HEALTH84 
HEALTH Miror (Dual) 
MIL, 84 7 Remove 


Overlay multiple graphs into « single пате 


9/4J«) 
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X, Y, and Z variable(s). The structure of the resulting chart depends on the variables 
that you select. For example, select a categorical x variable and a y variable to chart the 
mean of the y variable within each category of the x variable, or add a z variable for a 
three-dimensional chart. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side-by-side in separate frames or overlaid in a single 
display. 

Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 
MultiPlot. Creates a table of two-dimensional line charts with grouping variable 
categories appearing along the top and left of the table and X-variable and Y-variable 
categories appearing along the right and bottom. This orientation is similar to a Trellis 
display. 

Repeated trials. Display means for two or more variables identified as repeated trials. 
The Grouping variable(s) option allows the means to be broken down by two between 
subjects’ factors. Specify multiple у variables and select Repeated trials to display 
repeated measures. 

Counts of Y*X. Creates a temporary 2 variable that acts as crosstabulation of x and y, 
to produce a three-dimensional display of the number of cases in each combination of 
x and y variables. 

Matrix columns. Creates a display where x is the variable index, y is the case index, 
and z is the data value. Select one or more 2 variables without selecting an x or y 
variable to enable this option. 

Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side-by-side. Different colors, symbols, or patterns distinguish 
separate plots or subpopulations. 

In addition, you can include error bars, change coordinate systems, and customize the 
layout, axes, and appearance of the charts. 


Line Options 
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Ж Graph:Line Chart 


Coor | LireSyie — 


Ems | Coordinates | Xsis | үле | cse 


[Г] Display the values as a percentage of the sum 
[E Display the median instead of the mean for height 
E ådd pseudo 3-D depth 


The following options are available when creating a line chart: 


= Display the values as a percentage of the sum. Instead of counts, SYSTAT plots 
values as a percentage of the sum. 

ш Display the median instead of the mean for height. For two-dimensional charts 
involving a y-variable or three-dimensional charts involving a z-variable, select 
this option to display the median of the y or z variable instead of the mean. 
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W Add pseudo 3-D depth. Adds a 3-D effect to a two-dimensional chart. 


Profile Chart Dialog Box 


Profile produces a variety of two- and three-dimensional charts showing shaded areas 


to represent counts or means within categories. 


To open the Profile Chart dialog box, from the menus choose: 


Graph 
Summary Charts 
Profile... 
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m Graph:Summary Charts:Profile 


veal 
Е Required» 


BIRTH 82 = Repeated trials 
BIRTH_RT 
DEATH_82 
DEATH RT 
BABYMT82 
BABYMORT 
LIFE EXP ee ш o 
БМР 82 - FIR - Counts of Y *Х 
БМР 86 = = Matrix columns 
GDP. CAP 
LOG GDP 
gee заваа 
HEALTH84 >=] 
HEALTH - = Mitror (Dual) 
MIL 84 B ‘< L Мары 
Overlay multiple graphs into а single frame 


Stack profiles of multiple variables 


X, Y, and Z variable(s). The structure of the resulting chart depends on the variables 
that you select. For example, select a categorical x variable and a y variable to chart the 
mean of the y variable within each category of the x variable, or add a z variable for a 
three-dimensional chart. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side-by-side in separate frames or overlaid in a single 
display. 
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Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


Multiplot. Creates a table of two-dimensional profile charts with grouping variable 
categories appearing along the top and left of the table and X-variable and Y-variable 
categories appearing along the right and bottom. This orientation is similar to a Trellis 
display. 

Repeated trials. Display means for two or more variables identified as repeated trials. 
The Grouping variable(s) option allows the means to be broken down by two between 
subjects" factors. Specify multiple y variables and select Repeated Trials to display 
repeated measures. 

Counts of Y*X. Creates a temporary z variable that acts as crosstabulation of x and y, 
to produce a three-dimensional display of the number of cases in each combination of 
x and y variables. 

Matrix columns. Creates a display where x is the variable index, y is the case index, 
and z is the data value. Select one or more z variables without selecting an x or y 
variable to enable this option. 

Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side-by-side. Different colors, symbols, or patterns distinguish 
separate plots or subpopulations. 


Stack profiles of multiple variables. If you select multiple y or z variables, select this 
option to stack profiles in a single frame rather than create side-by-side charts. 


In addition, you can include error bars, change coordinate systems, and customize the 
layout, axes, and appearance of the charts. 
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Profile Options 


m Graph:Summary Charts:Profile 


E] Display the values as a percentage of the sum 
[E] Display the median instead of the mean for height 
[Add pseudo 3-D depth 


The following options are available when creating a profile chart: 


ш Display the values as a percentage of the sum. Instead of counts, SYSTAT plots 
values as a percentage of the sum. 
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m Display the median instead of the mean for height. For two-dimensional charts 
involving a y-variable or three-dimensional charts involving a z-variable, select 
this option to display the median of the y or z variable instead of the mean. 


ш Add pseudo 3-D depth. Adds a 3-D effect to a two-dimensional chart. 


Pyramid Chart Dialog Box 


Pyramid produces a variety of two- and three-dimensional charts showing pyramids 


representing counts or means within categories. 


To open the Pyramid Chart dialog box, from the menus choose: 


Graph 
Summary Charts 
Pyramid... 
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Graph:Summary Charts:Pyramid 


Available variable(s]: X-variable(s]: 
COUNTRY$ \ «Required» 
POP 1983 Add» 
POP 1986 
POP. 1990 (= Remoye 
POP_2020 
URBAN 
BIRTH_82 Repeated trial 
BIRTH RT : 
DEATH 82 
DEATH RT 
BABYMT82 
BABYMORT 
LIFE EXP Counts oY чы 
GNP 82 T aunts of Y *; 
GNP 86 Matix column: 
GDP. CAP 
LOG GDP 
EDUC 84 
EDUC 
HEALTH84 
HEALTH Mirror (Dual) 
MIL. 84 Парох MultiPlot 


'&tlay multiple graphs into a single frame 


SHARE ed 


OKK 


X, Y, and Z variable(s). The structure of the resulting chart depends on the variables 
that you select. For example, select a categorical x variable and a y variable to chart the 
mean of the y variable within each category of the x variable, or add a z variable for a 
three-dimensional chart. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side-by-side in separate frames or overlaid in a single 
display. 


43 


Bar, Dot, Line, Profile, Pyramid, and Pie Charts 


Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


Multiplot. Creates a table of two-dimensional pyramid charts with grouping variable 
categories appearing along the top and left of the table and X-variable and Y-variable 
categories appearing along the right and bottom. This orientation is similar to a Trellis 
display. 

Repeated trials. Display means for two or more variables identified as repeated trials. 
The Grouping variable(s) option allows the means to be broken down by two between 
subjects’ factors. Specify multiple у variables and select Repeated Trials to display 
repeated measures. 

Counts of Y*X. Creates a temporary 2 variable that acts as crosstabulation of x and y, 
to produce a three-dimensional display of the number of cases in each combination of 
x and y variables. 

Matrix columns. Creates a display where x is the variable index, y is the case index, 
and z is the data value. Select one or more 2 variables without selecting an x or y 
variable to enable this option. 

Overlay multiple graphs into а single frame. You can display multiple graphs in the 
same frame rather than side-by-side. Different colors, symbols, or patterns distinguish 
separate plots or subpopulations. 

In addition, you can include error bars, change coordinate systems, and customize the 
layout, axes, and appearance of the charts. 
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Pyramid Options 


еен [| бю ][ rm 
Coordinates | xs | Ys | c5 


Display the values as а percentage of the sum 

© Display the median instead of the mean for height 

[Add pseudo 3-0 depth 

El Anchor the pyramidstoY [| 

Line connected in left-to-right order 

Width of pyramids [os 5 
Label Lanes 
[7] Label the pyramids 


Label size 


The following options are available when creating a pyramid chart: 


m Display the values as a percentage of the sum. Creates a chart that shows values as 
a percentage of the sum. 


45 


Bar, Dot, Line, Profile, Pyramid, and Pie Charts 


= Display the median instead of the mean for height. For two-dimensional charts 
involving a y-variable or three-dimensional charts involving a z-variable, select 
this option to display the median of the y or z variable instead of the mean. 
Add pseudo 3-D depth. Adds a 3-D effect to a two-dimensional chart. 
Anchor the pyramids to Y. Use this option to compare a variable against a standard 
level. The pyramids reach up or down relative to the value you enter. For example, 
you could chart the amount of rainfall in several regions and specify the mean as 
an anchor to emphasize the areas with the most extreme precipitation conditions. 
For two-dimensional charts, the pyramid base rests at the anchor with the apex 
reaching to the plotted value. For three-dimensional charts, pyramids below the 
anchor have their apex at the anchor with the pyramid base at the plotted value. 

m Width of pyramids. Changes the thickness of the pyramids. Setting the width to 
0.90 produces extremely wide pyramids, while setting it to 0.10 would produce 
very thin pyramids. The default is 0.50. 

m Label the pyramids. Displays the exact value represented by each pyramid. 


m Label size. Increase or decrease the font size for the label. 


Pie Chart Dialog Box 


Although pie charts present the same information as the other charts discussed in this 
chapter, the format differs. Bar, dot, line, profile, and pyramid charts use unique 

elements in a common coordinate system to represent counts and means. Pie charts, on 
the other hand, present these data as wedges of a circular area. As a result, the options 


available for pie charts differ from those associated with the other charts. 
To open the Pie Chart dialog box, from the menus choose: 


Graph 
Pie Chart... 
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Graph:Pie Chart 


Available variable(s]: Category variable(s]: 
COUNTRY$ | Ыз «Required» 
POP. 1983 J Add J 

POP_1986 c Remove 
POP_1990 ee 
POP_2020 Slice variable(s]: 
URBAN EET) 

BIRTH_82 0019 
BIRTH RT 


DEATH, 82 | Remove 
DEATH. RT 
ВАВҮМТ82 | 
ВАВҮМОВТ Add-» 
LIFE, EXP 


, = 
(MD 07 L4 «- Remove | 


[Г] Separate slice from pie [Г] Attention Map (Ring) 


Slice number [C] Add pseudo 30 depth 


Pie Chart produces pie charts that display counts or means as a proportion of the total. 


Category variable(s). One or more categorical variables that define the categories. 
Counts for each category are displayed as a proportion of the total count. A separate 
pie is produced for each variable selected. 


Slice variable(s). Variable to be summarized within each category. The mean of the 
selected variable is displayed as a proportion of the sum of means. A separate pie is 
produced for each variable selected. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations are plotted side-by-side. 


Separate a slice from the pie. Select this to emphasize a certain category. 


Slice number. Specify which slice to separate. 
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Attention map (Ring). The ring plot, or attention map, draws a set of concentric rings 
beginning with a smallest ring for the first category. The radius of each ring is the sum 
of the previous radii plus the amount due to the corresponding category. This plot type 
is sometimes used by newspapers as an attention map showing a paper's relative rates 
of reporting from local to international news. 


Add pseudo 3-D depth. This feature adds depth to a two-dimensional pie chart. 


Pie Options 


Graph:Pie Chart 
“Main | Options | Layout] 
Display scale 


Scale transformation 


a) None 


Loa 18 


Label 
[7] Label the value of each slice 


The following scaling and labelling options are available when creating a pie chart: 


m Display scale. A scale labels the categories in the chart. Deselect this option to hide 
the scale. 


48 
Chapter 2 


W Scale transformation. You can specify a power or log transformation for each 
slice. 


т Label the value of each slice. Displays the exact value represented by each slice. 
W Size. Increase or decrease the font size for the label. 


Using Commands 


To create a graph, first tell SYSTAT what data to use: 


USE FILENAME 


Then specify the name of the procedure (BAR, DOT, LINE, PROFILE, or PYRAMID). In 
displaying the command syntax here, we use BAR—you can substitute any one of the 


other procedures in its place. 

2-D display of counts BAR xvarlist / options 

3-D display of counts BAR . * yvar * xvar / options 

2-D display of means BAR yvarlist * xvar / options 

3-D display of means BAR zvarlist * yvar * xvar / options 


Repeated measures display ВАК yvarlist / REPEAT options 
BAR yvarlist / REPEAT options, 
GROUP-grpvarl,grpvar2 
BAR yvarlist / REPEAT OVERLAY options, 
GROUP-grpvarl,grpvar2 


м: BAR zvarlist / MATRIX options 
atrix display BAR zvarlist / MATRIX TILE options 
BAR xvarlist / GROUP-grpvarl,grpvar2, 
Tad lut MULTIPLOT 


The syntax for the PIE command is: 


Pie chart of counts PIE category-varlist / options 
Pie charts of means PIE slice-varlist * categoryvar / options 
Examples for Counts 


When you specify one or two categorical variables (x variables) only, Bar, Dot, Line, 
Profile, Pyramid, and Pie Chart (for Category variable(s)) produce a display of counts. 
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Example 1 
Two-Dimensional Charts for Counts 


For the following example, we create two bar charts displaying the number of people 
attaining each level (coded 1 through 7) of education. The second chart extends the plot 
scale and inserts a count at the top of each bar. 


The input is: 


USE SURVEY2 
BEGIN 

BAR EDUCATN / LOC--3IN, 0IN 

BAR EDUCATN / LABEL YMAX-115 LOC=3IN,0IN 
END 


The output is: 


0712345678 0712345678 
EDUCATN EDUCATN 


For code 3, there are 98 people (a 3 means high school graduate), while for code 1, 
there are only 3 people. To make the chart more informative, combine some of the 


categories. 
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The input is: 
USE SURVEY2 
RECODE EDUCATNS$ = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
4-'Some college' 5-'College grad', 
6,7='Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', 'Some college', 
'College grad', 'Degree +' 
BAR EDUCATNS / FILL-3 XLAB-'EDUCATION' 
The output is: 


2 


70767070767: 


5050 


O 


OOOO 


х2 


You сап use string or numeric variables to specify categories. You can also request upto 
256 categories, but you will need an ultra-high-resolution device to display the labels 


06060606044 


(2 


clearly for that many categories. 


SYSTAT automatically prints category labels at a slant if there is not enough room 
to print the labels horizontally. If there are more than 10 categories on the horizontal 


axis, SYSTAT shrinks the size of the lettering to fit the scale. 
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Pseudo 3-D Display 
Here, we turn the same counts into a 3-D line display. 
The input is: 
LINE EDUCATNS / THREED XLAB-'EDUCATION' 


The output is: 


Count 
a 
© 
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Other Displays for the Same Counts 


Now let us look at how the other displays portray the same counts. 


The input is: 


USE SURVEY2 

RECODE EDUCATN$ = EDUCATN/1,2-'HS dropout' 3='HS grad', 
4='Some college' 5-'College grad', 
6,7='Degree +' 

ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', 


'Some college', 'College grad', 'Degree +' 
BEGIN 
DOT EDUCATNS$ / XLAB-' ' SYMB-1 FILL-1 SIZE-2 TITLE-"Dot", 
LOC--5.5IN,5IN 
LINE EDUCATNS / XLAB-' ' YLAB=' ' E-"Line" LOC-OIN,5IN 
PIE EDUCATN$ / SLICE-3 TITLE-"Pie* LOC=-4IN, -2IN 
PRO EDUCATNS / XLAB-' ' YLAB-' ' TITLE="Profile", 
LOC-5.5IN,5IN 
PYR EDUCATN$ / XLAB-' ' TITLE-"Pyramid" LOC=4IN, -2IN 
END 
The output is: 
Dot Line. Profile 
109 = = т ini 
»| of Д к 
m wL / м 
70| т 70| 
© єр, © 
Гея ни 3 z 
xo м) А " 
v) ] m о 
РРЛ УР: 
“+ fw 
Ре 
Pyramid 
р 
HS grad HS dropout "i | 
sob 1 
nl | 
© 
зы! 
Degree + 5 oi 
1 A 
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For a pie chart of counts, SYSTAT tallies the number of cases in each category, 
converts each count to a proportion of the total count, and shows that proportion with 
a wedge. 


Using Counts as Input 


If you have already tallied a variable, you can produce these same displays directly 
from the counts in the categories. For example, create a file named MYEDUC that has 
5 records instead of the 256 records in SURVEY2: 


EDUCATION COUNT 
HS Dropout 50 
HS Grad 98 
Some college 45 
College Grad 41 
Degree + 22 


Before requesting displays, identify the variable COUNT as a frequency variable: 


USE MYEDUC 
FREQ COUNT 


Now request your display. 


Example 2 
Two-Dimensional Charts for Counts of Multiple Variables 


It is easy to specify displays for several variables in one request. In this example, we 

do this for education, sex, age, marital status, and employment status as recorded in the 
SURVEY? data file. The five graphs appear on the screen together (or on a single page 
if printed). 
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The input is: 


USE SURVEY2 
RECODE EDUCATN$ - EDUCATN/1,2-'HS dropout' 3-'HS grad', 
4='Some college' 5-'College grad', 
6,7-'Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', ‘Some college', 
'College grad', 'Degree +' 
RECODE AGE$- AGE / ..29 = '18 to 29',30 .. 45='30 to 45', 
46 .. 64-'46 to 64',65 .. -'65 & Over' 
ORDER AGES / SORT- '18 to 29','30 to 45','46 to 64','65 & Over' 
RECODE EMPLOYS -EMPLOY / 1='Full time', 
2='Part time',3-'Unemployed' 
ORDER EMPLOYS / SORT- 'Full time','Part time','Unemployed' 
LINE EDUCATNS$ SEX$ AGE$ MARITAL$ EMPLOYS 
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The output is: 


Count. 
PETEEIEEELE] 
Count. 

8 ë 
Count. 
PEEXEIEIERELL 


Fa e Pj un сис Г уң! F4 Rg ot 


FFP 


MARITALS 


EMPLOYS 


This is the default layout for five graphs. Similar plots result for bar, dot, profile, and 
pyramid charts. 


Example 3 
Three-Dimensional Charts for Counts 


Now we create real 3-D displays. We use sex and education simultaneously to define 
categories and count the people within each category. We continue using the SURVEY2 
data (preserving the Label and Category settings used in Example 2). 


The input is: 


USE SURVEY2 
RECODE EDUCATN$ = EDUCATN/1,2='HS dropout' 3='HS grad', 
4='Some college' 5-'College grad', 
6,7="Degree +' 
ORDER EDUCATN$ / SORT= 'HS dropout', 'HS grad', 
'Some college', 
'College grad', 'Degree +' 
BAR .*EDUCATNS * SEXS / AXES-BOOK XGRID YGRID ZGRID 
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Note that to request a 3-D display of counts via commands, you need to type a period 
(.) before the first asterisk (*). 


The output is: 


The tallest bars are for the high school graduates (59 females and 39 males). 


More 3-D Displays 


Dot, Line, Profile, and Pyramid can also display data in 3-D. Dot produces small 
"cubes" hanging in space. The line plot in 3-D becomes a "ribbon" plot. 
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The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2-'HS dropout' 3='HS grad', 
4='Some college' 5-'College grad', 
6,7='Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', ‘HS grad', 'Some college', 
'College grad', 'Degree +' 


BEGIN 
DOT . * EDUCATN$ * SEX$ /YLAB-'EDUCATN' XGRID YGRID ZGRID, 
AXES-BOOK SYM-1 
LINE . * EDUCATN$ * SEX$ /YLAB-'EDUCATN' XGRID YGRID ZGRID, 
AXES-BOOK, 
TITLE-'Line' LOC-4IN,3IN 
* EDUCATNS * SEX$ /YLAB-'EDUCATN' XGRID YGRID ZGRID, 
AXES-BOOK, 
TITLE-'Profile' LOC--4IN,-4IN 
PYR . * EDUCATNS * SEX$ /YLAB-'EDUCATN' XGRID YGRID ZGRID, 
AXES-BOOK, 
TITLE-'Pyramid' LOC-4IN,-4IN 


PRO . 


END 
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The output is: 


Example 4 
Collapsing a 3-D Display into 2-D 


Using SURVEY2, set SEX$ as the grouping variable and overlay multiple graphs into 
a single frame to display the same counts as in the two-dimensional charts for the 
counts example. 
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The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2='HS dropout' 3='HS grad', 
4-'Some college' 5-'College grad', 
6,7-'Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', 'Some college', 
'College grad', 'Degree +' 
BAR EDUCATNS / GROUP=SEX$ XLAB-'EDUCATN' OVERLAY YMAX=65 FILL=1,2 


The output is: 
65 
52 
z 39 
M 
13 sea 
Юа Fomaie 
o Z ме 
LELES 
cry 
а бй 
EDUCATN 
More Collapsed Displays 


Collapsed dot, line, profile, and pyramid displays follow. To save space, omit plot 
legends by specifying LEGEND-NONE. 
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The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2='HS dropout' 3-'HS grad', 
='Some college' 5-'College grad', 
6,7='Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout’, 'HS grad', 'Some college', 
'College grad', 'Degree +' 


BEGIN 

DOT EDUCATN$ / GROUP=SEX$ XLAB-'EDUCATN' OVERLAY YMAX-65, 
TITLE-'Dot' LOC--3IN,3IN LEGEND-NONE SYM-4,5 

LINE EDUCATN$ / GROUP-SEX$ XLAB-'EDUCATN' OVERLAY YMAX-65, 
TITLE-'Line' LOC-3IN,3IN LEGEND=NONE DASH-1, 7 

PRO EDUCATN$ / GROUP=SEX$ XLAB-'EDUCATN' OVERLAY YMAX-65, 
TITLE-'PROFILE' LOC--3IN,-4IN LEGEND-NONE, 
FILL-1,2 

PYR EDUCATN$ / GROUP-SEX$ XLAB-'EDUCATN' OVERLAY YMAX-65, 
TITLE-'Pyramid' LOC-3IN,-4IN LEGEND-NONE, 
FILL-1,2 


END 


The output is: 
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Example 5 
Stratifying by Variables 


This example is a variation of crosstabulation and collapsed displays. We display the 


counts for males and females in separate frames. Note that the data within each frame 
determine the plot scale. 


The input is: 


USE SURVEY2 
RECODE EDUCATN$ = EDUCATN/1,2-'HS dropout' 3 
4='Some college' 
6,7='Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', 'Some college', 


'College grad', 'Degree *' 
BAR EDUCATN$ / GROUP=SEX$ XLAB-'EDUCATN' 


-'HS grad', 
5-'College grad', 


The output is: 
Female Male 
60 в 
40 40 
20 20 
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More Displays 


The other univariate graphs display the same counts as the bar chart. 


The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
4-'Some college' 5-'College grad', 


6,7='Degree +' ы 
ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', 'Some college', 
"College grad', 'Degree +' 


BEGIN 
DOT EDUCATN$ / GROUP-SEX$ YMAX=70 LOC=.5IN QIN, 
SIZE-2 XLAB-' ' YMAX-70 YFOR-0 SYMBOL-1 
LINE EDUCATN$ / GROUP=SEX$  YMAX-70 LOC=8IN 9ІМ, 
XLAB-' ' YMAX-70 YFOR=0 
PRO EDUCATN$ / GROUP=SEX$ YMAX-70, 
LOC-.5IN,4.6IN XLAB-' ' YMAX-70 YFOR-0 


PYR EDUCATN$ / GROUP=SEX$  YMAX-70 LOC=8IN 4.6IN, 
-' ' YMAX-70 YFOR-0 
PIE EDUCATN$ / GROUP=SEX$  SLICE-3, 
LOC=4IN .6IN 


END 
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The output is: 


Examples for Means 


For the displays in the previous section, SYSTAT simply tallied the number of cases 
within each category or cross-classification of two categories. At first glance, the 
displays in this section appear similar—however, the height of each object is a mean, 
not a count. For these displays, enter a y variable to be averaged for each x variable. 
For pie charts, enter a slice variable to be averaged for each category variable. 
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6 
Two-Dimensional Charts for Means 


In this example, we plot average personal income within each category of education. 
For the first display, the categories are defined by the values of EDUCATN stored in 
the SURVEY? file. In the second graph, we collapse the seven codes into five 
categories and assign names to these categories. 


The input is: 


USE SURVEY2 
Баш 
BAR INCOME * EDUCATN / LABEL LOC = -3,0 
RECODE EDUCATN$ = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
='Some college', 
='College grad', 
6,7-'Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', 
'Some college', 
'College grad', 'Degree +' 
BAR INCOME * EDUCATN$ / FILL-0 SERROR XLAB-'EDUCATN', 
LOC = 
END 


SYSTAT assumes that the first variable specified is quantitative. 
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The output is: 

70, т 1 
ki 
A ] 

M T 
"A 
20 " И | | | 
o [|| | | | 
KEIRERBRERER 


In the second plot, the length of the standard error bars is related to the size of the 
means (large means have tall error bars, for example.) 
Square Root Transformation 


Here, we take the square root of each income value before SYSTAT computes the 


mean. 


The input is: 
BAR INCOME * EDUCATNS / SERROR YPOW XLAB-'EDUCATN' FILL=0 
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The output is: 


The difference between the length of the longest and shortest error bars is less after the 
transformation, but the length of the error bar still has some relation to the size of the 
mean. You may want to try a log transformation. 


Pseudo 3-D Display 


A pseudo 3-D display of average income is shown below. 


The input is: 


BAR INCOME * EDUCATN$ / THREED XLAB-'EDUCATN' 
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The output is: 


INCOME 


More Displays 


The other univariate graphs display average income as shown below. 


The input is: 


USE SURVEY2 
RECODE EDUCATN$ = EDUCATN/ 1,2='HS dropout' 3-'HS grad', 
4='Some college' 5-'College grad', 
6,7='Degree +' 
ORDER EDUCATN$ / SORT- 'HS dropout', 'HS grad', 'Some college', 
'College grad', ‘Degree +' 
BEGIN 
DOT INCOME * EDUCATN$ /SERROR SYMBOL-1 XLAB-' ' TITLE-'Dot', 
LOC--6IN,4IN 
LINE INCOME * EDUCATNS / TITLE-'Line' XLAB-'EDUCATN', 
YLAB-' ', LOC-OIN,4IN 
PRO INCOME * EDUCATN$ / FILL-4 XLAB=' ' YLAB=' ', 
TITLE-'Profile' LOC-6IN,4IN 
PYR INCOME * EDUCATNS / TITLE-'Pyramid' XLAB-'EDUCATN', 
LOC-3IN, -3IN 
PIE INCOME * EDUCATNS / SLICE-4 TITLE-'Pie',LOC--5IN,-3IN 
END 
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The output is: 
Dot ure Profile 
70, T 70, 
60 60} 
so ] 50) 
E i 1 40) 
i э) 
20) ' 20 
w ' 10} 
ovr We ol 


For a pie chart of means, SYSTAT computes a mean for each category, converts each 
mean to a proportion of the sum of the means, and shows that proportion as a wedge 
of the whole. 

We could have produced the same displays by entering the means directly. Your file 
would then have 5 cases instead of 256. 
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Example 7 
Three-Dimensional Charts for Means 


To produce a 3-D means graph, enter x, y, and z variables, where z is averaged for both 


x and y. 


The input is: 


USE SURVEY2 

RECODE EDUCATN$ = EDUCATN/1,2='HS dropout' 3='HS grad', 
4-'Some college' 5-'College grad', 
6,7='Degree +' 

ORDER EDUCATNS / SORT- 'HS dropout', ‘HS grad', 'Some college', 

'College grad', 'Degree +' 
PYRAMID INCOME * EDUCATNS * SEX$ / XGRID YGRID ZGRID AXES-BOOK, 
YLAB-'EDUCATN' 


The output is: 
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More Displays 
Following are 3-D means for the other displays. 


USE SURVEY2 
RECODE EDUCATN$ = EDUCATN/1,2-'HS dropout' 3='HS grad', 
4-'Some college' 5-'College grad', 
6,7='Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', ‘HS grad', 'Some college', 
'College grad', 'Degree +' 
BEGIN 
DOT INCOME * EDUCATN$ * SEX$ / XGRID YGRID ZGRID AXES-BOOK , 
LOC--3IN,3.5IN TITLE-'Dot', 
YLAB-'EDUCATN' 
LINE INCOME * EDUCATN$ * SEX$ / XGRID YGRID ZGRID AXES-BOOK, 
LOC-3IN,3.5IN TITLE-'Line', 
YLAB-'EDUCATN' 
PRO INCOME * EDUCATN$ * SEX$ / XGRID YGRID ZGRID AXES-BOOK, 
LOC--3IN, -3.5IN, 
TITLE-'Profile', YLAB-'EDUCATN' 
BAR INCOME * EDUCATNS$ * SEX$ / XGRID YGRID ZGRID AXES-BOOK, 
LOC-3IN,-3.5IN TITLE-'Bar', 
YLAB-'EDUCATN' 


END 
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The output is: 


Dot Line 


Example 8 

Grouping by Stratifying Variables 
In this example, we display means side-by-side in a 2-D format. To produce this graph, 
identify SEXS as a grouping variable, rather than a z variable, and overlay multiple 
graphs into a single frame. 
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The input is: 
USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
4='Some college' 5-'College grad', 
6,7-'Degree +' 
ORDER EDUCATNS / SORT= 'HS dropout', ‘HS grad', ‘Some college', 
'College grad', 'Degree +' 
DOT INCOME * EDUCATN$ / GROUP=SEX$ OVERLAY SIZE-3 SYM-21,20, 
FILL-1 XLAB-'EDUCATN' 
The output is: 
т; 
7 =з 
= 
owe 
SS oe 
$ P fa^ 
ew "d 
EDUCATN 
More Displays 


You can display these same means as line, bar, profile, or pyramid graphs. However, 
be careful when overlaying multiple graphs into a single frame with a profile display, 
because larger values for the last group drawn may mask those for another group. Here, 
males tend to earn more than females in four of the five categories, so we want the 
averages for females to be drawn last. To do this, we order the categories of SEXS. 
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The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
4-'Some college' 5-'College grad', 
6,7='Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', ‘Some college’, 
"College grad', ‘Degree *' 


BEGIN 

LINE INCOME * EDUCATN$ / GROUP=SEX$ OVERLAY SERROR DASH-1,2, 
XLAB-' ' LEGEND-NONE LOC--3IN,3IN 

BAR INCOME * EDUCATN$ / GROUP=SEX$ OVERLAY XLAB-' ', 
LEGEND-NONE SERROR LOC-3IN,3IN 

ORDER SEX$ / SORT-'Male', 'Female' 

PRO INCOME * EDUCATN$ / GROUP=sex$ OVERLAY XLAB-' ', 
FILL-2,1 LEGEND=NONE LOC--3IN,-3IN 

PYR INCOME * EDUCATN$ / GROUP=SEX$ OVERLAY XLAB-' ', 

LEGEND-NONE LOC-3IN, -3IN 


ORDER 
END 

The output is: 
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Displaying Graphs in Separate Frames 
You can display the graphs in separate frames. 


The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2=‘HS dropout' 3='HS grad', 
З 4='Some college’ 5-'College grad', 


6,72'Degree +' 

ORDER EDUCATNS / SORT= 'HS ан t', 'HS grad', ‘some ‘college’, 
‘Coll *, ‘Degree + 

DOT INCOME * * EDUCATNS / GROUP=SEXS SIZE=2 FILL=1 XLAB-'EDUCATN' 


The other displays draw the same means as shown below. 


The input is: 
BEGIN 
LINE INCOME * EDUCATN$ / GROUP=SEX$ YLAB-' ' LOC--2.7IN,2IN 
XLAB-' ' YMAX-7 


D. її 
BAR INCOME * EDUCATNS / GROUP=SEX$ are ' LOC-3IN,2IN, 
XLAB-' YMAX-7 


PRO INCOME * EDUCATNS$ / GROUP-SEX$ ТАВА * LOC=-2.7IN,-2IN, 
XLAB-' ' YMAX-70 

PYR INCOME * EDUCATN$ / GROUP-SEX$ YLAB-' ' LOC-3IN, -2IN, 
XLAB-' ' YMAX-70 


END 
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The output is: 
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Example 9 
Dual Displays 


When you have two stratifying variables and one has only two levels, the means of the 
quantitative variable can be shown in a dual display. 


The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
4-'Some college' 5-'College grad', 
6,7-'Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', 'Some college', 
'College grad', 'Degree *' 
PROFILE INCOME * EDUCATNS / DUAL=sex$ FILL-1,2 XLAB-'EDUCATN' 
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Displaying Graphs in Separate Frames 
You can display the graphs in separate frames. 


The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
='Some college' 5-'College grad', 
6,7='Degree +' 
ORDER EDUCATN$ / SORT- 'HS dropout', 'HS grad', 'Some college', 
'College grad', 'Degree +' 
DOT INCOME * EDUCATNS / GROUP=SEX$ SIZE-2 FILL=1 XLAB-'EDUCATN' 


The output is: 


Female 


The other displays draw the same means as shown below. 


"m 
The input is: . 
BEGIN 
LINE INCOME * EDUCATNS / GROUP-S LAB-' ' LOC--2.7IN,2IN, 
XLAB-' ' 70 
BAR INCOME * EDUCATNS / GROUP=SEX$ YLAB=' ' LOC-3IN,2IN, 
XLAB-' ' YMAX-70 
PRO INCOME * EDUCATNS$ / GROUP=SEX$ YLAB-' ' LOC--2.7IN,-2IN, 
XLAB-' ' YMAX-70 
PYR INCOME * ЕРОСАТМ$ / GROUP=SEX$ YLAB-' ' LOC-3IN,-2IN, 
XLAB-' ' YMAX-70 
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The output is: 
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Ехатріе 9 
Dual Displays 


When you have two stratifying variables and one has only two levels, the means ofthe 
quantitative variable can be shown in a dual display. 


The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2-'HS dropout' 3='HS grad', 
4='Some college' 5-'College grad', 
6,7='Degree +' 
ORDER EDUCATNS / SORT= 'HS dropout’, 'HS grad', 'Some college’, 
'College grad', 'Degree +' 
PROFILE INCOME * EDUCATNS / DUAL-sex$ FILL-1,2 XLAB-'EDUCATN' 
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The output is: 


You can also use dual displays for bar, line, dot and pyramid displays. 


The input is: 


BEGIN 
BAR INCOME * EDUCATNS / DUAL-SEX$ SYMBOL-1,2 XLAB-' ' 
LEGEND-NONE LOC--3IN,3IN FILL- n 2 
LINE INCOME * EDUCATN$ / DUAL-SEX$ XLAB-' ' LEGEND= NONE, 
LOC-3IN,3IN DASH-1,2 
DOT INCOME * EDUCATNS / DUAL-SEX$ SYM-21, 20 SIZE-4 XLAB-' 
LEGEND-NONE LOC--3IN,-3IN 
PYR INCOME * EDUCATN$ / DUAL=SEX$ FILL-1,2 LEGEND- NONE, 
LOC=3IN, -3IN XLAB-' : 
END 
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The output is: 
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Examples for Multivariable Displays 
(Including Repeated Measures) 


SYSTAT provides multivariable displays and repeated measures plots. To understand 
these terms, look at this data structure. The data are total CD sales in U.S. dollars. Each 
case (row in the data file) is a store and includes its sales for 1986, 1990, and 1994 plus 


a character grouping code for the city. 
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CD SALES 
1986 1990 1994 
CITY vl v2 v3 
Paris gl 
Tokyo g2 DATA 
New York g3 


Now suppose that, for each year, SYSTAT computes the average CD sales (that is, nine 
means—three variables by three levels of a stratifying variable). For the display, 
SYSTAT can group these means by city or by year: 


Repeated Measures 
93 


Multivariable 


group? group2 group3 vari var2 vara 
Paris Tokyo New York 1986 1990 1994 


Multivariable displays. Follow the instructions for 2-D means to display means of up 
to 12 quantitative variables against a categorical variable. As with a single variable, 
SYSTAT computes the mean of each quantitative variable for each category. For 
example, specify several y variables and one x variable. If you select Overlay multiple 
graphs into a single frame, the means of the y variables are displayed on a common scale 
in a single frame. If you do not select the overlay option, the means are plotted in 
separate frames using a scale determined by the values within each frame. 


Using commands: 


BAR yvarlist * xvar / OVERLAY 
BAR yvarlist * xvar 


79 
Bar, Dot, Line, Profile, Pyramid, and Pie Charts 


Repeated measures. For repeated measures data (two or more dependent variables 
measured on the same scale), specify several y variables and select Repeated Trials. 


Use the following syntax to display the means in a single frame: 


BAR yvarlist / REPEAT 


If you have one or two between-subjects factors, assign a grouping variable with or 
without selecting Overlay multiple graphs into a single frame. 


BAR xvarlist / GROUP-grpvarl,grpvar2 REPEAT 
BAR xvarlist / GROUP-grpvarl,grpvar2 REPEAT OVERLAY 


Example 10 
Stacked Bars 


The OURWORLD data file contains one case for each of 57 countries. We investigate 
expenditures that each country makes on education (EDUC), health (HEALTH), and 
the military (MIL) all measured in U.S. dollars per person per year. So, for each 
country, there are three variables, or repeated measures. In addition, the variable 
GROUPS allows us to assign the countries (cases) to one of three categories (Europe, 
Islamic, or New World). In this example, we arrange the averages of the three variables 
by group. In the repeated measures example, we display the averages by variables. 


The input is: 


USE OURWORLD 
BEGIN 
BAR EDUC MIL HEALTH * GROUPS / OVERLAY  SERROR, 
COLOR-BLUE, RED, BLACK, 
FILL-2,1,3 LEGEND-NONE, 
LOC--3IN, 0IN 
BAR EDUC MIL HEALTH * GROUPS / OVERLAY  SERROR YPOW, 
COLOR=BLUE, RED, BLACK, 
YLAB='Average Expenditure (per, 
person)', FILL=2,1,3 LEGEND=NONE, 
LOC=31N, QIN 
END 
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Instead of displaying bars for two or more quantitative variables side-by-side, you can 
stack bars on top of each other. You can stack results for up to 12 different variables. 


The input is: 
BAR EDUC MIL HEALTH * GROUPS / STACK COLOR-BLUE,RED,BLACK FILL-2,1,3 


The output is: 
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Percentages 


The European expenditures are greater than those for the other groups, making it hard 
to assess the relative contribution of each within groups. If you choose to display the 
values as a percentage of the sum, SYSTAT sums the means within each group and 
forms each bar proportionately to the sum. Here, we use a stacked bar chart to examine 
differences between the groups. 


The input is: 
USE OURWORLD 


BAR EDUC MIL HEALTH * GROUPS / PERCENT STACK FILL-2,1,3, 
COLOR-BLUE, RED, BLACK 


The output is: 


7 
AY 
PEF 


GROUP$ 


Relative to their expenditures on health and education, the Islamic nations spend more 
on the military than do the European and New World countries. 


Example 11 
Repeated Measures 


We continue with the same means displayed in the stacked bars examples, but do not 
identify an x variable. Instead, we change the focus from groups to variables using 
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repeated trials for the y variables. For the first set of graphs, we ignore the group 


structure. 
The input is: 
USE OURWORLD 
BEGIN 
BAR EDUC MIL HEALTH / YPOW REPEAT SERROR XLAB-' ' FILL=0, 
LOC--3IN,2IN 
DOT EDUC MIL HEALTH / YPOW REPEAT SERROR XLAB-' ' LINE, 


SYM-1 SIZE=2 LOC-3IN,2IN 
LINE EDUC MIL HEALTH / YPOW REPEAT SERROR XLAB=' ', 
LOC-OIN,-3IN 
END 


The output is: 


dà 
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Stratified by Group 


Here, we display the repeated measures separately for each group. Stratify the means 
by selecting GROUPS as a grouping variable. 
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The input is: 
BAR EDUC MIL HEALTH / YPOW REPEAT GROUP=GROUP$ SERROR FILL-0 ROW=1 


The output is: 
Europe Islamic New World 
1500 1500 1500 
1000 1000 4000 
5 500 $ 500 § 500 
{ { j 
> > > 
0 ps 0 cd 0 > 
$ wy о у s о aw 
SOT SOT „т 
Trial Trial таа! 


Note that by default the maximum value within each group is used to set the plot scale. 


АП Groups within One Frame 


Here, we overlay multiple graphs into a single frame to display the means for all groups 
within a single frame. 


The input is: 


BAR EDUC MIL HEALTH / YPOW REPEAT GROUP=GROUP$ SERROR OVERLAY, 
XLAB-' ' FILL-2,1,3 COLOR-BLUE, RED,BLACK 


The output is: 


Chapter 2 
| 
E жиды 
More Displays 


DOT and LINE can also produce useful displays for repeated measures. 


The input is: 


BEGIN 
DOT EDUC MIL HEALTH / YPOW REPEAT GROUP=GROUP$ FILL=1, 
SYM=1,4,5 SIZE=3 COLOR-RED, BLUE, BLACK, 
OVERLAY, XLAB-' ' LEGEND=NONE, 
LOCz-3IN,OIN 
LINE EDUC MIL HEALTH / YPOW REPEAT GROUP=GROUP$ DASH=2,1, ‚5, 
COLOR=RED, BLUE, BLACK OVERLAY  XLAB- , 
‚© LOCz3IN,0IN, LEGEND-NONE 
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The output is: 


Example 12 
Repeated Measures with Two Grouping Factors and One Trial Factor 


Use REPEAT for a repeated measures design with two between-subjects factors 
(borrowed from a classic example by Winer, 1991). An experiment is repeated four 
times for each subject and the number of errors recorded as TRIAL(1), TRIAL(2), 
TRIAL(3), and TRIAL(4). Each subject is assigned to one of four experimental 
conditions formed by cross-classifying the grouping factors ANXIETY and TENSION. 


The input is: 


USE REPEATl 


CAT TENSION ANXIETY 
BAR TRIAL(1 .. 4) / GROUP-ANXIETY, TENSION REPEAT XLAB=' ' 
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АШ Groups within One Frame 


You can show all the groups in a single frame. 


The input is: 


BAR TRIAL(1 .. 


4) / GROUP=ANXIETY, TENSION 


REPEAT XLAB='' OVERLAY 
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The output is: 
20, 
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To interpret interactions in an analysis of variance (ANOVA), you may want to look 
across the repeated measures or study the means organized by cell. The codes for the 
grouping factors ANXIETY and TENSION are | and 2. By multiplying the former by 
10 and adding the latter, we create four new codes (11, 12, 21, and 22) that identify 
each cell in the design. Here are the same means as before, but now grouped by the new 
CELL variable. 


The input is: 


USE REPEAT1 
LET CELL = 10*ANXIETY + TENSION 
RECODE CELL$ = CELL / 11='A=lo T=lo', 12='A=lo T=hi', 
21='А=һі Т=10', 22='А=һі T=hi' 
ORDER CELL$ / SORT- 'А=10 T-lo', 'A-lo T-hi', 
‘ashi Т=10', 'A-hi T=hi' 
BAR TRIAL(1 .. 4) * CELL$ / OVERLAY FILL=2,1,3,1 XLAB-'CELL' 
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Stratified Dot Display 


The following is a dot display for the cell means. 


The input is: 
DOT TRIAL(1 .. 4) / GROUP-ANXIETY, TENSION, REPEAT, LINE, SERROR, 
XLAB-' ' 
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More Dot Displays 


Here, we overlay multiple graphs into a single frame and, in the plot on the right, 
display the repeated measures against the derived variable CELL. 


The input is: 


BEGIN 


DOT TRIAL(1 .. 4) / GROUP-ANXIETY, TENSION REPEAT OVERLAY, 
S 


YM-4,1,5,6 SIZE-2,LOC--4,0, 
COLOR-BLUE, RED, BLACK, GREEN 
* CELL / OVERLAY LINE SYM-4,1,5,6 SIZE-2, 


4) 
COLOR=BLUE, RED, BLACK, GREEN, LOC-4 , 0 


DOT TRIAL(1 .. 


END 
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The output is: 
2001—— 1 Mo — 
+ IE aa 
se o + " Ы А 
a jl. 
IE LI E 3 io $y 1 
$ | mw | — / 
s aisi o ss 
LJ Var - + mo 
BT * mio 


Means for TRIAL(4) are displayed along the bottom line, means for TR/AL(3) are 
displayed above it, and so on. 


Example 13 
Inputting One Record per Bar for Total Votes 


If you input one record for each category, you can determine the height or size of each 
display element directly. (If there are several records for each category, the values of 
the quantitative variable are averaged within each category.) This means that you can 
input any measure you want, such as total sales by province or median (instead of 
mean) income. You can also input a measure to determine the length of the error bar— 
for example, for a robust measure of spread, half the interquartile range. 

Instead of viewing the mean of values within each category, we look at the total. In 
this section, we display the total vote count within each of nine U.S. census divisions 
in the 1992 presidential election. However, we first write one new record with the total 
votes for each of the nine divisions. That is, we started with the USSTATES file that 
has one record for each of the 48 contiguous U.S. states with the vote count (in 
thousands) for George Bush, Bill Clinton, and Ross Perot and a grouping code to 
indicate the state's census division (New England to Pacific). 
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The input is: 


USE USSTATES 
BY DIVISIONS 
SSAVE USVOTES 
CSTATISTICS BUSH CLINTON PEROT / SUM 
USE USVOTES 
ORDER DIVISIONS / SORT-'Pacific', 'Mountain', 
'W S Central', 'E S Central', 
' М Central', 'E N Central', 
'S Atlantic', 'Mid Atlantic', 
'New England' 
BAR BUSH CLINTON PEROT * DIVISIONS / OVERLAY YLAB='Total', 
FILL-2,1,5 XLAB=' ', 
COLOR=BLUE, RED, BLACK, 
HEI-2IN WID-3.21N 


The output is: 


@ BUSH 
lll CLINTON 
О PEROT 


Stacked Bars 
Now, we stack bars of multiple variables. 


The input is: 


BAR BUSH CLINTON PEROT * DIVISIONS / STACK FILL-2,1,5, XLAB-' ' 
COLOR-BLUE, RED, BLACK, 
HEI=2IN WID-3.21N 


D 
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The output is: 


4 


More Displays 


Dot, line, and profile charts are also useful displays for these total counts. Be careful 
when you use Overlay with profile charts because it draws the variables in the order in 
which you specify them. We select CLINTON first because his counts are largest, then 
BUSH, and finally PEROT. When the Stack option is used, the variables can be 
selected in any order. 
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The input is: 
BEGIN 
DOT BUSH CLINTON PEROT * DIVISIONS / OVERLAY LINE, 
DASH=1,2,7 YLAB='Total, 
Votes' XLAB-' ', 
HEI-2IN WID=3.2IN, 
LEG-NONE, 
COLOR-RED, BLACK, BLUE, 
TITLE-'Dot', 
LOC--2.5IN,2IN 
LINE BUSH CLINTON PEROT * DIVISIONS / OVERLAY, 
DASH-1,2,7 YLAB-'Total, 
Votes' XLAB-' 
HEI-2IN WID-3.2IN, 
LEG-NONE, 
COLOR-RED, BLACK, BLUE, 
TITLE-'Line', 
T дыл. -5IN, 2IN 
PRO CLINTON BUSH PEROT * DIVISI! OVERLAY, 
| шө net, й 
N WID=3.2IN, 


LOC: 2.51N, -2IN, 
' TITLE-'Profile with, 
overlay" 
PRO BUSH CLINTON PEROT * DIVISIONS / STACK FILL=1,2,7, 
YLAB='Total Votes', 
XLAB-' ' HEI-2IN, 
_ WID-3.2IN, © \ 
LEG-NONE 


COLOR-RED, BLACK, BLUE, 
{ тү LOC-2.5IN,-2IN, 
түт TI "Profile with, 
нр l etak? 
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mA 
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Here, we display a multivariable bar chart as percentages. Within each division, what 
percentage of the vote did each candidate receive? 

First, the bars for the three variables are placed side-by-side; next, they are stacked. 
Last, we display the same totals as a stacked profile chart. We continue using the 


USVOTES data. 


The input is: 


BAR BUSH CLINTON PEROT * DIVISIONS / ҮМІМ=0 FILL-1,2,5, 


HEI-2IN WID-3.2IN , 
XLAB-' ' PERCENT STACK, 
YLAB-' ' LEG-NONE, 

COLOR- RED, BLACK, BLUE 
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More Displays and Features 


This section describes an assortment of other displays that you may find useful. 


Example 14 
Range Bar Chart 


It is often helpful to compare and contrast ranges between variables in a bar chart. The 
following data, found in the C/TYTEMP file, consist of low and high January 
temperatures for eight U.S. cities in 1993. 


CITYS HIGH LOW 
Seattle 44 34 
Los Angeles 67 48 
Phoenix 65 39 
Denver 43 16 
New Orleans 62 43 
Chicago 29 14 
Miami 75 59 


New York 37 26 
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We plot the range of temperatures by city and draw a reference line at 32°. 


The input is: 


USE CITYTEMP 

ORDER CITYS / SORT-NONE 

BAR LOW HIGH * CITYS / RANGE YLIMIT=32 HEI-2IN, 
WIDz3.21N 


The output is: 


Value 
85882838 


ee t t 
СІТҮ$ 
It looks as though Denver, Chicago, and New York should be avoided in January. 


Example 15 
Anchored Bar Chart 


Anchoring bars in a bar chart is useful for profit/loss charts and other graphs that 
compare a variable against a standard level. Let's look at average household income by 
census division, first in a typical bar chart and then by anchoring bars to Y to 27000 to 
compare each average against $27,000. We start from a 2-D means graph. 
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The input is: 


USE USSTATES 
ORDER DIVISIONS / SORT=NONE 
BEGIN 
BAR INCOME * DIVISIONS / HEI-2IN WID-3.2IN  LOC--2.51IN,0IN 
BAR INCOME * DIVISIONS / BASE-27000 HEI-2IN WID-3.2IN, 
LOC-2.5IN,0IN 
END 


The output is: 


80000 Set 50000 


INCOME 
INCOME 


E 
E 


10000 10000 

Qe o Р E ò а & 

ARP EMEA 
DIVISIONS DIVISIONS 


Pyramid charts yield similar plots. 


Example 16 
Divided Bar Chart 


If you have data that are percentages of wholes, you can use a pie chart or a divided bar 
chart. We are not sure which chart is better, since the experimental evidence favoring 
one or the other is scant and mixed (for example, see Simkin and Hastie, 1987). In any 
case, the divided bar chart is like unrolling a pie chart into a soldier’s service bars. 
As with the pie chart, the percentage bar chart can be produced in several ways. If 
you specify only a categorical variable, SYSTAT tallies the instances of each separate 
value of the variable, sums the tallies, and finally divides the bar according to the 


proportions of tallies. 
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If you specify a continuous variable and a categorical variable, SYSTAT computes 
the mean value of the continuous variable within each separate level or category of the 
categorical variable. It then sums these means and divides the bar according to the 


proportions of the sum. 

A divided bar chart of the percentage of states within each census division is shown 
below. We use the USSTATES data file that has one record for each of the 48 
contiguous U.S. states, follow the procedure for producing 2-D counts, and display 
values as a percentage of the sum. 


The input is: 


USE USSTATES 
ORDER DIVISIONS / SORT-NONE 
BAR DIVISIONS / STACK PERCENT LABEL 


The output is: 


Pacific (6.396) 
Mountain (16.796) 


МУ S Central (8.3%) 
E S Central (8.396) 


S Atlantic (16.796) 


_ W N Central (14.6%) 


E N Central (10.4%) 
Mid Atlantic (6.396) 
New England (12.5%) 


There are three states in the Pacific region, eight in the Mountain region, four in W S 
Central, and so on. The percentages for these counts are displayed under the name of 
each division. 


99 
Bar, Dot, Line, Profile, Pyramid, and Pie Charts 


Example 17 
Star Plot 


You can produce dot plots in polar coordinates. This may seem strange at first, but this 
is a way to produce what are sometimes called star, snowflake, or radar plots. You can 
add grid marks inside the frame and experiment with the other options (for example, 
tick marks, error bars, axes, minima and maxima) to produce variants. 

An icon plots graph contains a star icon plot, which is a close relative of the polar 
display. The star icon, however, plots each case in a file as a separate star and each 
variable as a point. The polar display makes each category a point and each variable a 
star. 

Following is an example of a polar display for the governor's salary grouped by 
census division, using the USSTATES file. We start with a dot graph for 2-D means and 
draw grid lines at $50,000 and $100,000 for the second graph. 


The input is: 


USE USSTATES 

ORDER DIVISIONS / SORT-NONE 

BEGIN 

DOT GOVSLRY * DIVISIONS / SIZE-0 POLAR LINE LOC-3IN,OIN 

DOT GOVSLRY * DIVISIONS / SIZE-0 POLAR LINE YGRID, 
LOC--3IN, OIN 


END 


The output is: 
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The average governor's salary for states in the Mid Atlantic division is over $100,000, 
while that for the W S Central division is less than $70,000. 


Example 18 
Attention Map 


The ring plot, or attention map, draws a set of concentric rings beginning with a 
smallest ring for the first category. The radius of each ring is the sum of the previous 
radii plus the amount due to the corresponding category. This plot is sometimes used 
by newspapers as an attention map showing a paper’s relative rates of reporting from 
local to international news. 

For this type of chart, it's best to have an ordered scale of categories. The following 
are some typical attention data from The Cape Codder, a local newspaper. We have 
expressed the attention values as percentages. If we hadn't, SYSTAT would add them 
together and compute percentages before drawing the map, just as it does with pies. 
The results would have been the same. 


LOCUS$ PERCENT 
TOWN 40 
COUNTY 20 
STATE 10 
NATION 16 
WORLD 14 
The input is: 
USE CODDER 


ORDER LOCUS$/ SORT=NONE 
PIE PERCENT * LOCUSS / RING 
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The output is:. 


WORLD 


COUNTY 
TOWN 


Example 19 
Shaded Data Matrix 


In this example, we produce a shaded representation of a data matrix. For each data 
value, MATRIX generates a row (case) index and a column (variable) index. Thus, 
information is available for a 3-D display in which 2 = the data value, y = the case 
index, and x = the variable index. We then collapse the 3-D information into a 2-D 


mosaic. 


The input is: 


USE OURWORLD 

LET (GDP. CAP, EDUC, MCDONALD) =L10 (8) 

STAND / RANGE 

BAR LITERACY LIFEEXPF GDP CAP EDUC URBAN BABYMORT B. TO D /, 
TILE MATRIX YREV, 
AXES=NONE ZMAX=1.0, 
HEI-6IN 


102 | 
Chapter 2 


The output is: 
0 
10 
$20 
б 
5% 
Èw 
Hi 
50 mos 
БЕ 
60 Boo 
Ф 9 
E ee Ф; 
Index of Variable 
Example 20 
Error Bars 


SYSTAT provides conventional error bars to indicate the variability of the measure 
displayed. As an alternative to the usual vertical line bounded by horizontal ticks, 
SYSTAT also offers a box. Illustrations of both types follow. 


The input is: 


USE SURVEY2 

RECODE EDUCATN$ = EDUCATN/1,2-'HS dropout' 3-'HS grad', 

4='Some college' 5-'College grad' 6,7-'Degree +' 

ORDER ЕРОСАТМ$ / SORT- 'HS dropout', 'HS grad', 'Some college', 

'College grad', ‘Degree +’ 

BEGIN 
BAR INCOME * EDUCATNS / SERROR FILL=0 LOC=-3IN, 0IN XLAB='EDUCATN' 
BAR INCOME * EDUCATN$ / SERROR FILL-0 ETYPE-BOX LOC-3IN,O0IN, 

XLAB- 'EDUCATN' 


END 
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The output is: 


The top of each bar or box extends one standard error above the mean. 


Interval and Interquartile Range 


If the data follow a normal distribution, we might expect 5096 of the values to fall 
within the mean plus or minus 0.674 times the standard deviation. Box plots also 
identify where the central 50% of the observations fall (within the interval bounded by 
the box); however, normal theory is not used to determine the interval. 

Since SYSTAT allows you to overlay displays easily, let's compare the interval 
(и F0.6740) with the interquartile range of the box plot. Both intervals should include 
50% of the values. For the normal theory interval, we use a dot display with box error 
bars. The data are U.S. dollars spent per person for education (EDUC) in Islamic and 
New World countries. 

Here, we request two overlaid displays, each including a box plot and dot display— 
the first with the data as measured; the second with EDUC in log base 10 units. 
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The input is: 


USE OURWORLD 
BEGIN 
SELECT GROUPS «»'Europe' 
BOX EDUC * GROUPS / AXES-NONE SCALE-NONE LOC = -3.0 
DOT EDUC * GROUPS / ERROR-.5  ETYPE-BOX FILL-0 COLOR-RED, 
SYMBOL-18  ETHICK-0.2,LOC = -3, 0, 
YLAB-'Dollars (per person) for education' 
BOX EDUC * GROUPS / YLOG  AXES-NONE SCALE=NONE LOC = 3, & 
DOT EDUC * GROUPS / ERROR-.5 ETYPE-BOX FILL-0 YLOG, 
COLOR-RED SYMBOL=18 ЕТНІСК=0.2, 
Loc = 3, 0, 
YLAB='Dollars (per person) for education' 
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The output is: 
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For each group, іп the display on the left, the dot display's normal theory interval (the 
tall thin rectangle) is at least twice as long as that for the box, and the box plots show 
that distributions are very right-skewed and have outliers. The wide horizontal line in 
each box marks the median. The asterisk (*) within the dot display rectangle marks the 
mean. Note that both means fall at the top ofthe box, or the 75th percentile. The values 
of these statistics for the 16 Islamic and 21 New World countries are: 


Islamic New World 
Median $13.67 $57.39 
Mean 68.18 12322 


In the display on the right, the length ofthe thin dot rectangle corresponds to the length 
of the box (the coverage of the two types of intervals is remarkably alike), and 
agreement between the median and mean is much closer. After taking logs, the 
statistics are: 


Islamic New World 


Median 1.132 1.759 
Mean 1.280 ($19.06) 1.734 ($54.20) 
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We used the calculator to transform the mean (for example, if you enter CALC 1071.280, 
SYSTAT returns 19.06). 

Why is the agreement between the two 50% intervals so poor in the display on the 
left and so good on the right? The distributions are very right-skewed on the left and 
fairly symmetric on the right. Features based on normal theory (such as standard error 
bars) should not be used for highly skewed distributions. 

On the left, the displays are squeezed into the lower fourth of the frame, Some might 
wonder about the vast white space above the rectangles. SYSTAT uses the minimum 
and maximum sample values to set the scale on the у axis. If you produce a plot that 
has features compressed into a small area, stop and think about your data—the white 
space may be a flag that your data are inappropriate for the display (for example, you 
could log transform the data on the left). 


Counts 


SYSTAT can also produce error bars for counts. If your counts are binomially or 
multinomially distributed, you may want a confidence interval for the count in each 
bar. We do this for the number of buses failing after driving a given distance (1 of 10 
distances). 


The input is: 


USE BUSES 

FREQ COUNT 

CAT DISTANCE 

BAR DISTANCE / SERROR FILL-.3 ҮМАХ=64 
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The output is: 


123456 7 8 9 10 
DISTANCE 


User-Supplied Error Bars 


As an alternative to SYSTAT's automatic error bars, you can specify a variable whose 
values determine the length of the error bars. To generate user-supplied error bars, you 
should start with a file in which each record contains a measurement (for example, the 
mean) and its standard deviation or standard error. Such a file can be created using 
Statistics, for example. It is up to you whether you choose standard deviations, 
standard errors of the mean, two times the standard deviation, or some other measure 
of spread. (Note that if your file contains more than one case per group, SYSTAT uses 
the mean of the error variable's values for each group to determine the length of the 


error bars.) 


Suppose we have the following data (in the ENERGY file): 


DENSITY SE ENERGY 


.09 .03 LOW 
40 04 MEDIUM 
45 02 шон 


To create а bar chart with error bars extending the value of SE in each direction, 
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The input is: 
USE ENERGY 


ORDER ENERGY$ / SORT=NONE 
BAR DENSITY * ENERGYS / ERROR=SE FILL=0 


The output is: 


05 


£85 


ENERGYS 


Plot Points 


Sometimes we want plots that show standard errors around plotted points. For 
example, each point might represent a group of plants receiving the same dose of a 
growth hormone. In this case, you can plot the standard error of the mean around the 
point for the group. 
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Following are some data with a standard error variable, in the file GROWTH: 


DOSE GROWTH SE 


500 110 > 
800 112 6 
1000 116 7 
1200 18 9 
1400 120 13 
1700 135 20 
1900 140 22 
2200 150 24 
2900 210 26 
The input is: 
USE GROWTH / 


DOT GROWTH * DOSE / FILL ERROR=SE ҮМАХ=250 
We set YMAX at 250 so that the error bars fit. 


The output is: 
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Example 21 
Control Limits 


Dashed lines on the x or y axes can be used to represent data limits, Quality control 
charts, for example, mark upper and lower limits to indicate permissible bounds for a 
production process. For most plots, specify two numbers (upper and lower) for the 
limits. They need not be in order. If you want to draw only one dashed line, specify one 
number. 

Below we plot the average math SAT exam score for each U.S. state. The 25th 
percentile of the 50 scores is 472, and the 75th percentile is 530. We mark these 
quartiles on the display and, to save vertical space, use SYSTAT’s built-in case 
sequence variable, CASE, to separate the states into two panels. We also reverse the 
scale of the y axis to place the first case in the file at the top of the display. 


The input is: 


USE USSTATES 
ORDER STATES / SORT-NONE 
BEGIN 
SELECT CASE «- 25 
DOT MATH * STATES / YMIN-400 ҮМАХ=600  HEI-6IN, 
YLIMIT-472,530 TRANS XREV, 
XLAB-' ' 
SELECT CASE > 25 
DOT MATH * STATE$ / YMIN=400 YMAX=600 HEI=6IN, 
YLIMIT=472,530 TRANS XREV, 
XLAB=' ' 
END 
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The average math SAT scores for Wisconsin, Minnesota, Iowa, North Dakota, and 
South Dakota are above 550, while two states have scores below 450. 
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Density Charts 


Leland Wilkinson 


The density of a sample is the relative concentration of data points in intervals across 
the range of the distribution. A histogram is one way to display the density of a 
quantitative variable; box plots, dot or symmetric dot density, frequency polygons, 
fuzzygrams, jitter plots, density stripes, and histograms with data-driven bar widths 
are others. 

A histogram is the most familiar one among these displays. The word comes from 
a Greek word (histos) for a straight standing beam, like a mast or loom frame, and a 
word (gram) for a drawn picture. Thus, a histogram is a pictorial display of vertically 
standing bars. It is a crude density estimator because the shape of a histogram depends 
upon the choice of the number of bars. Most other graphical density estimation 
methods depend on subjective choices of parameters (or settings) as well, which is 
one reason the general field of density estimation is rather controversial (Wegman, 
1982). 

SYSTAT can use the sample mean and standard deviation to construct a normal 
curve (or cumulative normal curve) for comparison against the actual anomalies of 
the sample distribution. A kernel curve is also available for density and distribution 
curves. 

Rather than comparing sample values to the normal distribution (mean, standard 
deviation, and so on), box plots show robust statistics (median, quartiles, and so on). 
Some complain that box plots or the choice of intervals for bars in a histogram can 
mask gaps or separations in the distribution. Dot histograms (dit) and symmetric dot 
displays (dot) answer this problem because they display every value in the sample. It 
is often useful to examine both a box plot and a dit display. A gap histogram is another 
alternative. Its bar widths vary across the range of the distribution—when there are 
gaps, the neighboring bar is made wider to include the gap. 
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Fuzzygrams superimpose a probability distribution on each bar of a histogram. Bars 
for histograms based on small samples are fuzzier than bars for large sample 
histograms. Jittered dot density displays points by calculating the exact locations of the 
data values and then, to keep points from colliding, jittering them randomly on a short 
vertical axis. These displays work better for large samples than small samples. Density 
stripes are vertical lines placed at the location of data values along a horizontal data 
scale and look like supermarket bar codes. For large samples, the stripes tend to 
collide, so you should consider a jitter dot density instead. 

Bivariate densities can be displayed as 3-D histograms and as 2-D surfaces or 
contours constructed using normal theory sample statistics or a nonparametric kernel 
estimator. 

These displays can be stratified across the levels of a grouping variable; for 2-D 
displays, if the grouping variable has only two values, a dual (or back-to-back) version 
is available. 
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Sample Displays 


Histogram Frequency polygon 
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3-D histogram 
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Bivariate normal density 


Kernel density estimator stratified by sex 
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Histograms show the sample density of a continuous variable with a series of vertical 
bars. The height of each bar represents the number of cases that fall within a given 


interval. 


To open the Histogram dialog box, from the menus choose: 


Graph 
Histogram... 


Wi Graph:Histogram 


Fi. |. SufaceandLineS 


Main т т е 17 


Ү-Ахв | 


Available Available variable(s __ 
COUNTRY$ [COUNTRYS @| 
POP_1983 m | 
POP_1986 | 
РОР _1990 | 
РОР_2020 

URBAN 

BIRTH. 82 =, 
BIRTH RT Add - 
DEATH 82 
DEATH RT 
BABYMT82 
BABYMORT 
LIFE EXP 
GNP 82 
GNP. 86 
GDP. CAP 


> 


c- Remove 


c- Remove 


Add ~> 


< Remove 


 —— 


Grouping variable(s): 


X-variable[s]: 
4 | «Required» 


Y-variable[s]: 


Repeated trials 


Mirror (Dual) 


Type of display: 
Overlay multiple graphs into а single frame 


Density o Y variable: 30 


Histogram м 


СОЛЕ 
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X and Y variables. The structure of the chart depends on the variables that you select. 
For example, select one or more x variables to obtain a univariate histogram for each 
variable, or select a y variable and a categorical x variable to obtain a bivariate 
histogram that displays the sample density within each category. 


Repeated trials. Specify two or more y variables, and select Repeated trials to display 
repeated measures. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames or overlaid in a single 
display. 


Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


Type of display. You can choose from a histogram, a gap histogram (with data-driven 
bar widths), a frequency polygon, or a fuzzygram (which superimposes a probability 
distribution on each bar of the histogram). 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side by side. Different colors, symbols, or patterns distinguish 
separate plots or subpopulations. 


Density of Y*X as Z variable. Creates a temporary z variable that acts as 
crosstabulation of x and y to produce a three-dimensional display (or a two- 
dimensional mosaic). 


In addition, you can change coordinate systems and customize the layout, axes, and 
appearance of the graphs. 
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Histogram Options 


W Graph:Histogram 


| 


| C Display cumulative frequency 


_Layout sd [ Coe | ға | Surface and Line Style | 


Main | Options [ Coordinates [| Халк | Ys | css | AlAxes | 


Number of bars: m | 


Width of bars: T 


The following options are available: 


Display cumulative frequency. Each bar's area is the sum of the preceding bar's 
area and its own incremental area. This makes a cumulative histogram correspond 
to a cumulative frequency distribution or, for continuous data, a distribution 


function. 
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w Number of bars. You can specify the number of bars (intervals or bins) displayed. 
The size of each interval depends on the range of the data. 

m Width of bars. You can specify the width of each interval. The actual number of 
bars depends on the range of the data. If you also specify a number of bars, this 
option takes precedence. 


Histogram Display Types 


In addition to a standard histogram, the following alternative display types are also 
available: 


Frequency polygons. Frequency polygons are produced by connecting the tops of the 
bars of a histogram with a line and then removing the bars. The shape of the polygon 
depends on the number of bars or cutpoints chosen. You may want to experiment with 
differing numbers of bars to see the differences. 


Gap histograms. The Gap histogram option produces histograms with data-driven bar 
widths. Bars are wider where there are “gaps” in your data. That is, when no value is 
reported, the width of the interval is increased to include that for a neighbor with data. 


Fuzzygrams. The Fuzzygram option produces a variation of the histogram that 
superimposes a probability distribution on each bar (Wilkinson, 1983). The purpose is 
to make the bars for histograms based on small samples fuzzier than the bars for large 
sample histograms. This distinguishes small variations or features from large 
variations or features. 

SYSTAT computes fuzzygrams in the following manner: Let p; = n;/ n be the 
sample estimate of л, , the expected proportion of a sample of n values from a 
continuous distribution to fall in the ith of k histogram bars (i = 1, К). Assume that k 
is selected, such that 0 < л, < 1. A fuzzygram is a histogram with bars represented by 
a gray scale distribution on (p, = P(p, » T;)) . That is, the more likely p; is greater 
than л;, the lighter the bar. 

The program computes the gray scale assuming that the joint distribution of the 
counts in the bars is multinomial. Using an arcsine transformation of the square root of 
the p,, it chooses the gaps between successive stripes in the bar from a normal variate 
with variance 1/ 4л. This distribution is adjusted for the number of bars in the 
display using an approximation to Bonferroni deviates. Haber and Wilkinson (1982) 
discuss perceptual issues in viewing the fuzzygram. A vertical line in the center of each 
bar reveals the height of the bar in the sample. If the sample size is small, the graph 
will be very blurred; if large, the bars are sharp. Finally, if a normal curve is overlaid, 
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the curve will follow cleanly through the fuzzy part of the graph (that is, the expected 
part). 


Box Plot Dialog Box 


SYSTAT creates box plots, notched box plots, and box plots combined with 
symmetrical dot densities. In a box plot, the center vertical line marks the median of 
the sample. The length of each box shows the range within which the central 50% of 
the values fall, with the box edges (called hinges) at the first and third quartiles. The 
whiskers show the range of values that fall within the inner fences (but do not 
necessarily extend all the way to the inner fences). Values between the inner and outer 
fences are plotted with asterisks. Values outside the outer fence are plotted with empty 
circles. The fences are defined as follows: 


Lower inner fence = lower hinge – (1.5 * (Hspread)) 
Upper inner fence = upper hinge + (1.5 * (Hspread)) 
Lower outer fence = lower hinge - (3 * (Hspread)) 
Upper outer fence = upper hinge + (3 * (Hspread)) 


Hspread is comparable to the interquartile range or midrange. It is the absolute value 
of the difference between the values of the two hinges. The whiskers show the range 
of values that fall within 1.5 Hspreads of the hinges. They do not necessarily extend to 
the inner fences. Values outside the inner fences are plotted with asterisks. Values 
outside the outer fences, called far outside values, are plotted with empty circles. 


To open the Box Plot dialog box, from the menus choose: 


Graph 
Box Plot... 
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HH Graph:Box Plot 


ЕЗ 


S [Line Style | 


sem | Color ИШИ; __ Symbol 
Main | Options | Coordinates | Хаж | ҮА: | AllAxes | Layout | 


Available variable(s]: X-variable(s]: 
COUNTRY$ z «Required» 
POP 1983 ay 
POP_1986 
POP_1990 
POP_2020 
URBAN 
BIRTH_82 


Overlay multiple graphs into а single frame 


KK 


X and Y variable(s). The structure of the chart depends on the variables that you select. 
For example, select one or more x variables to obtain univariate plots, or select a y 
variable and a categorical x variable to obtain a plot with a separate box for each 


category. 


| Repeated trial: 


Mirror (Dual) 


t 


Repeated trials. Specify two or more y variables, and select Repeated trials to display 


repeated measures. 
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Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames or overlaid in a single 
display. 

Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side by side. Different colors, symbols, or patterns distinguish 
separate plots or subpopulations. 


In addition, you can change coordinate systems and customize the layout, axes, and 
appearance of the box plots. 
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Box Plot Options 


H Graph:Box Plot 


[Г] Add notches that mark confidence intervals 
[Г] Combine with symmetrical dot density 


The following options are available: 


ш Add notches that mark confidence intervals. Boxes are notched (narrowed) at the 
median and return to full width at the lower and upper confidence interval values. 


= Combine with symmetrical dot density. Overlay a box plot and a dot density plot 
in the same display. 
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Dot Density Dialog Box 


Dot Density plots dots or stripes along a horizontal data scale at the exact locations of 
data values. Every value in the sample is displayed, thereby avoiding one of the 


limitations of histograms or box plots, both of which can mask gaps or separations in 
the distribution. 


To open the Dot Density dialog box, from the menus choose: 


Graph 
Dot Density... 


Ii Graph:Dot Density 


COUNTRY$ HI <Required> 
POP. 1993 = 
РОР-1986 
POP. 1990 
POP. 2020 
URBAN 

BIRTH. 82 | Repeated trials 
BIRTH-RT Add 5 erg 
DEATH_82 
DEATH.RT 
ВАВҮМТВ2 
BABYMORT 


} Mirror [Dual] 
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X and Y variables. The structure of the chart depends on the variables that you select. 
For example, select one or more x variables to obtain univariate plots, or select a y 
variable and a categorical x variable to obtain a bivariate plot that shows the sample 
density for each category. 


Repeated trials. Specify two or more y variables, and select Repeated trials to display 
repeated measures. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames or overlaid in a single 
display. 

Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


Туре of display. You can choose from a dot histogram (DIT), symmetrical dot density 
(DOT), jittered dot density, and density stripes. 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side by side. Different colors, symbols, or patterns distinguish 
separate plots or subpopulations. 


In addition, you can change coordinate systems and customize the layout, axes, and 
appearance of the displays 


Dot Density Display Types 


The following dot density display types are available: 


Dot histograms. The Dot histogram (Dit) option produces a density display that looks 
similar to a histogram. Unlike histograms, dot histograms represent every observation 
with a unique symbol, so they are especially suited for small- to moderate-sized 
samples of continuous data. The resolution of the graph is controlled by the size of the 
plot symbol. Dot histograms also resemble stem-and-leaf diagrams without the stems 
and substitute circle symbols for the leaves. When used with a grouping variable, this 
display type produces within-group density displays that are useful for screening data 
for analysis of variance and other procedures that group data. 


Symmetrical dot density plots. The Symmetrical dot density (Dot) option produces 
symmetrical dot displays. 
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Jittered dot density plots. The Jittered dot density option places points along a 
horizontal data scale at the locations of data values. To keep points from colliding, they 
are located randomly on a short vertical axis. Unlike histograms, no binning into bars 
is required. Unlike density stripes, jittered dot density displays work well with large 
samples because points do not overlap. Jittering is less appropriate for small samples, 
however, because the quantity of points is usually not sufficient to indicate a density in 
a given region. 

Density stripes. The Density stripes option places vertical lines at the location of data 
values on a horizontal data scale. Unlike histograms, no binning into bars is required, 
so density stripes are especially suited for small- to moderate-sized samples of 
continuous data. 


Density Function Dialog Box 
Density function plots normal or kernel curves for one or more variables. 
To open the Density Function dialog box, from the menus choose: 


Graph 
Distribution Plots 
Density Function... 
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Graph:Distribution Plots:Density Function 


[ ]Repeated trials 
> 


Grouping variable(s}: 


[Add | (Mirror (Dual) 


< Remove | 


Normal curve _ M 


[ Overlay multiple graphs single frame 
|_} Density of Y * X as Z-variable 3D 


X and Y variables. The structure of the chart depends on the variables that you select. 
Select one or more x variables to draw a curve in a separate frame for each variable, or 
select a y variable and a categorical x variable to plot a curve for each category of the 


x variable in a single plot. 
Repeated trials. Specify two or more y variables, and select Repeated trials to display 
repeated measures. 
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Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames or overlaid in a single 
display. 

Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


Type of display. You can choose from a normal curve or a kernel curve. 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side by side. Different colors, symbols, or patterns distinguish 
separate plots or subpopulations. 

Density of Y*X as Z variable. Creates a temporary z variable that acts as 
crosstabulation of x and y to produce a three-dimensional display (or a two- 
dimensional mosaic). 


In addition, you can change coordinate systems, and customize the layout, axes, and 
appearance of the displays. 
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Density Function Options 


Graph:Distribution Plots:Density Function 


( Laon [ ies [| См Fil Surface and Line Style 
Main | Options | Coordinates | ХА | YAxis | ZAvic | AllAxes | 


El Display cumulative frequency 
Number of bars: 


Width of bars: 


Tension 


The following options are available: 


ш Display cumulative frequency. Each bar's area is the sum of the preceding bar's 
area and its own incremental area. 

W Number of bars. You can specify the number of bars (intervals or bins) displayed. 
The size of each interval depends on the range of the data. 
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m Width of bars. You can specify the width of each interval. The actual number of 
bars depends on the range of the data. If you also specify a number of bars, this 
option takes precedence. 


m Tension. You can specify the degree to which the line or surface is allowed to flex 
locally to fit the data. Specify a value between 0 and 1. Lower values allow a 
greater degree of local flex. 


Using Commands 


After specifying the data file with USE filename: 


For univariate displays DENSITY x-varlist / type options 


For density displays forsubgroups DENSITY y-varlist * x-var / type options 
within a single frame 
For bivariate densities DENSITY . * y-var * x-var / type options 


Replace type with: 

HIST, for a histogram 

BOX, for a box plot 

DIT, for a dot histogram 

DOT, for a symmetric dot density plot 
DOX, for a box plot combined with a dot plot 
GAP, for a gap histogram 

FUZZY, for a fuzzygram 

JITTER, for a jittered dot density plot 
STRIPE, for a density stripe 
NORMAL, for a normal curve 
KERNEL, for a kernel curve 
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Examples 


Example 1 
Histograms, Frequency Polygons, and Cumulative Histograms 


In this example, the SURVEY2 data file (Afifi, Clark and May, 2003) is used to create 
a histogram, a frequency polygon, a cumulative histogram, and a gap histogram to 
display the distribution of personal income (in thousands of dollars per year). The 
SURVEY? data contain one case for each of the 256 people in the sample. 


The input is: 


USE SURVEY2 
BEGIN 
DENSITY INCOME / LOC --31N,-3IN TITLE - 'Histogram' 
DENSITY INCOME / POLY LOC =-3IN,3IN TITLE = ‘Frequency, 
Polygon' 


DENSITY INCOME / CUM LOC =3IN,-3IN TITLE = ‘Cumulative 
Histogram' 
DENSITY INCOME / GAP LOC -3IN,3IN TITLE - 'Gap Histogram' 


END 


As an alternative, you can replace DENSITY with HIST. You can use varying shades of 
gray (or screens with varying density) by specifying a number between 0 and | for 
FILL. 


pm LJ 
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The output is: 


Histogram 


Cumulative Histogram 

зот ттт тт} 10 

[о e 1 
b {ов 

"i | ái 
3 406 r ] 

04 

100 M 
402 02 4 
та жын, 
бю 20 30 40 50 60 700 000 10 20 30 40 50 60 70 
INCOME INCOME 


In all but the gap histogram, the scale on the left (Count) helps to show the number of 
cases in each bar. For example, from the bar on the right side of the histogram (between 
60 and 70), we see that 10 people earn around $65,000 per year. From the cumulative 
histogram, we see that more than 60 people earn $10,000 or less per year, and almost 
150 people earn $20,000 or less. The scale on the right of the histogram (Proportion 
per Bar) is the proportion of the sample in each bar. 

The horizontal scale is chosen to provide round numbers. SYSTAT uses initial 
estimates and a heuristic strategy to pick the number of bars and scale values to 
produce an aesthetic histogram (Sturges, 1926; Doane, 1976; Scott,1979). If there are 

(more than half), the program reduces the number. If some bars 


too many empty bars 
are too tall, the program increases the number. Several options let you adjust the values 
selected by the program. You can even do this interactively. 
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Transformations 


The distribution of INCOME is not symmetric: it is right-skewed. Compare the square 
root of the values of the histogram below with the first histogram. 


The input is: 
USE SURVEY2 
DENSITY INCOME / FILL-.1 XPOW 


The output is: 


0.14 
0.12 
0.10 
0.089 
0.008 
0048 


0.02 


0.00 
° Р? ФРФФФ 
INCOME 


Selecting a transformation interactively. You can produce this same histogram for the 
Square root of income dynamically. Starting with a display of income in the original 
units, you can use the Dynamic Explorer to manipulate the variable units until the 
desired shape is obtained. Alternatively you can use the axis setting option in the Graph 
Editor and make the square root transformation. 

To apply a log transformation, replace XPOW with XLOG in the command script 
prior to producing the graph or use the axis setting option in the Graph Editor tab after 
producing the graph. If you need a more complex transformation, you can always 
specify a LET or an IF ... THEN LET statement before the plot request. 
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Example 2 
Density Function or Distribution Curves 


If data are sampled from a normal distribution, then their histogram tends to have a 
normal shape. We can examine this possibility by using the sample mean and standard 
deviation to draw a normal curve. If you do not want to impose a functional form on 
the estimate of the density or distribution function curves, you may prefer a 
nonparametric kernel density estimator. 


The input is: 


USE SURVEY2 

BEGIN 
DENSITY INCOME / NORMAL XLIMIT-19  AXES-BOX  LOC--3IN,OIN 
DENSITY INCOME / NORMAL XLIMIT-19 AXES=BOX CUM LOC=3IN,0IN 


END 


In each display, we set the upper limit to 19 to mark the median of the sample. 


The output is: 
50 300 
10 
200 Ж | 
pP E ost 
= 100 мі 
ор o2* 


D 10 20 30 40 50 60 70 % 40 20 30 40 50 60 700 
INCOME INCOME 


Because the distribution is right-skewed, the peak (that is, the mean) of the normal 
curve falls to the right of the median. On the cumulative distribution curve, the median 


is closer to 0.4 than 0.5. 


Here, we draw the same curve with a square root transformation. 
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The input is: 
BEGIN 
DENSITY INCOME / NORMAL XLIMIT=19 AXES=BOX XPOW, 
--3IN, OIN 
DENSITY INCOME / NORMAL XLIMIT=19 AXES=BOX CUM, 
XPOW  LOC-3IN,OIN 
END 
The output is: 
40 300 
10 
0 g 
200 083 
a 
За E 062 
100 04 
10 
0.2 
0 0 0.0 
9 о фр eee s о Фф ФрФФО 
INCOME INCOME 
The agreement between the mean and median improves after transformation. 
Kernel Curves 


Below are two kernel density curves for income: the first is before a square root 
transformation, and the second is after a square root transformation. 


The input is: 


USE SURVEY2 


BEGIN 
DENSITY INCOME / KERNEL XLIMIT=19 AXES=BOX  LOC--3IN,O0IN 


DENSITY INCOME / KERNEL XLIMIT=19  AXES-BOX, 
XPOW  LOC-3IN,OIN 


END 
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The output is: 


0 10 20 30 40 50 60 70 ° ә © 
INCOME E ое 


Overlaying Curves 


It is easy to overlay curves on histograms using commands. Precede the first plot 
request with BEGIN and add END after the last request. To draw the plot's scale and 
axes only once, set AXES and SCALE equal to NONE on the second graph. You can turn 


off the labels by setting XLAB and YLAB to blanks. 


The input is: 


USE SURVEY2 
BEGIN 

DENSITY INCOME / FILL-.1 

DENSITY INCOME / NORMAL  AXES-NONE  SCALE-NONE, 
XLAB-' ' YLAB=' ' 
DENSITY INCOME / FILL-.1 
DENSITY INCOME / KERNEL  AXES-NONE SCALE=NONE, 
XLAB-' ' YLAB=' ' 


END 
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The output is: 
50 
0.18 
© 0.16. 
0.148 
E 0128 
E 03109 
20 0.08 
omg 
10 0.04 
0.02 
% 10 20 30 40 50 60 7000 
INCOME 
If desired, you can set the bar width and axis range. However, when overlaying 
histograms with density functions, set the bar width for the histogram before defining 
the axis range. Otherwise, the graph optimization algorithm may cause the bars to fall 
unevenly on the tick marks. 
Example 3 


Box Plots and Other Density Displays 


In this example, several alternatives to the histogram are displayed that are helpful for 
uncovering the characteristics of a distribution. No one display is best for all data. 


Often you will need to examine more than one display. 


We use a box plot, a dot histogram (dit), a symmetric dot density (dot), a fuzzygram, 
à jittered dot density plot, and density stripes to display INCOME using SURVEY 2. 


The input is: 


USE SURVEY2 

BEGIN 
DENSITY INCOME 
DENSITY INCOME 
DENSITY INCOME 
DENSITY INCOME 
DENSITY INCOME 
DENSITY INCOME 


BOX LOC-OIN,9IN 

DIT LOC--3IN,5IN 
DOT  LOC-3IN,SIN 
FUZZY LOC--3IN,-21N 
JITTER LOC=3IN,1IN 
STRIPE LOC=3IN,-2IN 


TEKINN 
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The output is: 
шаа аа ИШЛЕ] 
о 10 20 30 40 50 60 70 
INCOME 
p ae oe S S O S 
о 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 
INCOME INCOME 


Example 4 
Multiple Displays per Panel 


When you select more than one variable for a box, histogram, dot density, or density 
function plot, the resulting graphs are displayed in a single screen or, if you request, 
printed in a single row or column. Below are eight plots; by default, they are arranged 
in two 2-by-2 structures. 


We use the OURWORLD data that has data for 57 countries. 
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The input is: 
USE OURWORLD 
DEN URBAN LITERACY BABYMORT POP 1990 / FILL-.1,.1,.1,.1 
The output is: 
07 
loe 
T 
Е 


101 


224 907mm w 00 Фо РФ Фф о SSS 
URBAN LITERACY 


0 8 m af? E] 
BABYNORT 


xD LJ 
РОР 1990 


The distribution of URBAN is relatively symmetric, while that for LITERACY is very 
left-skewed, and those for BABYMORT and POP. 1990are right-skewed. We reach the 
same conclusions by examining these box plots. 


The input is: 


USE OURWORLD 
DEN URBAN LITERACY BABYMORT POP 1990 / BOX 
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The output is: 


erg in foe ift НА, 


——— M —— ИН ТЕГШ ШНГЕН ИШГЕН 
ю 20 30 40 8D €0 җ &D0 GO XO 31020 3 40 5 6D 70 &0 © ют 
URBAN LITERACY 


DL #| ыс ЕЧ {жо o o 


0 9 10 190 æ 0 9 100 19) ax 
BABYMORT РОР 1990 


The box plots show the median for each distribution and identify several outside values 
(*) and far outside values (°) for POP. 1990 distribution. 


Example 5 
Groups or Subpopulations in Density Charts 


SYSTAT provides several ways to display information for groups or subpopulations 
within your sample: 

m In separate frames. DENSITY argument / GROUP- grpvar 

m Within a single frame. DENSITY Y-varlist * X-var / options 

ш Back to back in a dual display. DENSITY argument / DUAL- grpvar 


Separate Frames 


In this example, we use the SURVEY? data to produce histograms of INCOME 
stratified by education. We define and label five educational categories (the input data 


142 
Chapter 3 


contain integer codes 1,2,...,7). To use the five new categories as a stratifying variable, 
identify EDUCATN as a category variable. 


The input is: 


USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
4-'Some college' 5-'College grad', 
6,7='Degree +' 
ORDER EDUCATNS / SORT- 'HS dropout', 'HS grad', ‘Some college', 
'College grad', 'Degree +' 


DEN INCOME / GROUP-EDUCATN$ FILL-.1,.1,.1,.1,.1, 
XMAX-70 YMAX-30 BWIDTH-5 


To plot the income values on the same scale, set the maximum to 70 for the x axis and 
30 for the y axis. To make the bar width the same within each frame, set the width of 
the bars to 5. By default, SYSTAT draws the five plots in two rows. To set the five 
displays in one row, use ROW-1. 


The output is: 
LLI 
°з 
qs Ee 
е А ci 
И: ЖҮ | 
Ё ae 
BIFTEL IEFTIN *o ao ao ao an eo oo 70 
E "WU" 
ae 2 
" uil. vo 
H og Е os 
E si ai 
w =. “Ё 
o3 °з 
о w о oo 
о 10 20.30 49 $0 60 70 оло 2 3 40 5 в т 
INCOME INCOME 


The distributions for subjects with less education are concentrated at the lower side of 
the income scale, while the distributions for those with more education are spread 
across the range of income. 
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Within a Single Frame 


Instead of separate frames, you can display the same information within a single frame. 
Using the SURVEY? data, we request histogram and box plots, and select EDUCATN 
as the x variable and INCOME as the y variable. 


The input is: 


USE SURVEY2 
RECODE EDUCATN$ = EDUCATN/1,2-'HS dropout' 3='HS grad', 
='Some college' 5='College grad', 
6,7='редгее +’ 
ORDER EDUCATN$ / SORT- 'HS dropout', 'HS grad', 'Some college', 
'College grad', 'Degree *' 


BEGIN 
DEN INCOME * EDUCATNS / FILL-.1 LOC--3in Oin XLAB-'EDUCATN' 


DEN INCOME * EDUCATNS / BOX LOC-3in Oin XLAB-'EDUCATN' 
END 


The output is: 


м of A 98 


EDUCATN 


The open circle in the box plot indicates that an income of $65,000 is a far outside 
value for the high school graduate group. 
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Notched Box Plots 


McGill, Tukey, and Larsen (1978) implemented confidence intervals on the median of 
several groups in a box plot. If the intervals around two medians do not overlap, you 
can be confident at about the 9596 level that the two population medians are different. 
Below is a notched box plot of INCOME by EDUCATN. The boxes are notched at 
the median and return to full width at the lower and upper confidence interval values. 
Notice that the outer confidence limit for the *Degree +” box extends beyond the 
hinges (the horizontal lines on either side of the narrow median line). Although not 
aesthetically pleasing, it adheres to Tukey and McGill's original standard for the plot. 


The input is: 

USE SURVEY2 

RECODE EDUCATNS = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
4='Some college' 5-'College grad', 
6,7='Degree +' 

ORDER EDUCATNS / SORT= 'HS dropout', 'HS grad', 'Some college', 

'College grad', 'Degree +' 
DEN INCOME * EDUCATN$ / BOX NOTCH XLAB-'EDUCATN' 


The output is: 


E 
© 


INCOME 
S s 8s 8 8 


0 > т 
А LEE 
EDUCATN 


Dual Displays 


A Mirror (Dual) display is another alternative when the subpopulation has only two 
levels. Below are a histogram and a dot histogram with income stratified by sex. 
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The input is: 
USE SURVEY2 


BEGIN 
DEN INCOME / DUAL-SEX$ FILL-.8,. 1 LEGEND-3IN,-1.2IN, 
LOC=-3IN, OIN 
DEN INCOME / DUAL=SEX$ DIT SYMBOL-1,5 FILL=1, .3, 
LEGEND-3IN,-1.21N LOC-2.5IN, OIN 


END 


The output is: 


Example 6 
Stratification by Multiple Variables 


The box, dot histogram, and symmetrical dot density plots can produce separate 
displays for each level of a stratifying variable, aligned on a common scale in a single 
frame. To do this, select a у variable, which functions as the measure of interest, and 
an x variable, which functions as a stratifying variable. While you can stratify only one 
variable this way, you can also select a grouping variable to stratify results further. In 
the graphs below, we've selected EDUCATN as the x variable, INCOME as the y 
variable, and SEXS as the grouping variable. 

Here, within each frame of the box plot and symmetric dot density (dot) plots, 
stratify by education; and across frames, stratify by SEX$. 
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The input is: 
USE SURVEY2 
RECODE EDUCATNS = EDUCATN/1,2='HS dropout' 3='HS grad', 
4='Some college’ 5-'College grad', 
6,7='Degree +' 


ORDER EDUCATNS$ / SORT= 'HS dropout', 'HS grad', 'Some college', 
‘College grad', 'Degree +' 
BEGIN 


DEN INCOME * EDUCATN$ / BOX GROUP=SEX$ XLAB-' ' LOC = OIN, 
1.5IN 


DEN INCOME * EDUCATNS / DOT SIZE-1.3 Au 
FILL-1 LOC - OIN ,-1.5IN 


147 
Density Charts 


Example 7 
Transposing the Display and Logging the Data 


Here we log the data and transpose the display, but first we show the data in the original 
units. We use the OURWORLD data and plot the money (converted to U.S. dollars) that 
57 countries spend per person on health. The expenditures are grouped by the type of 
country (GROUPS ). 


The input is: 


USE  OURWORLD 
BEGIN 
BOX HEALTH * GROUPS / YLAB-'Health Dollars (per person)', 
XLAB-'Type of Country', 
TRANS  LOC--3IN,OIN 
BOX HEALTH * GROUPS / YLAB-'Log Health Dollars', 
ie tin 


TRANS YLOG LOC=2.5IN,0IN 


END 
The output is: 
pent] 
i Islamic 
: Europe | 
0 500 1000 1500 1 10 100 1000 
Health Dollars (per person) Log Health Dollars. 


In the left plot, open circles mark far outside values or outliers. The distribution for 
these untransformed data is skewed. On the right, after transformation, the distribution 
is more symmetric, and the differences in variance are reduced. 
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Example 8 
Bivariate Histograms 


Let's use the SURVEY? data to study the joint distribution of INCOME and TOTAL, 
where the latter is a measure of depression symptoms experienced by each subject 
during the previous week (if the score is 16 or higher, the subject is considered 
depressed). The density is computed by SYSTAT and displayed on the z axis, creating 
a 3-D display. Notice that since there is no variable in the data file to display on the т 
axis, we use a period (SYSTAT's flag for missing values) to specify ће z variable. 


The input is: 
USE SURVEY2 
BEGIN 
DEN . * INCOME * TOTAL / LOC=-3IN,0IN 
DEN . * INCOME * TOTAL / XREV YREV LOC-3IN, OIN 
END 


The output is: 


EJ 
Ag А 


D 
^ 6 o 


^s 


Notice that by reversing the scale of both the x and y axes, the plot on the right 
corresponds to a 180 degree rotation of the plot on the left. Alternatively, you can use 
the Dynamic Explorer to rotate bivariate densities. 

In the rotated display on the right, it is easy to see that people with high incomes 
(over $50,000) have low depression scores, while those with severe depression 
symptoms (TOTAL > 30) have relatively low incomes. 
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Example 9 
Kernel and Normal Surfaces 


Bivariate nonparametric kernel density estimators are like continuous histograms and 
show where the sample data are most concentrated. This method uses the 
Epanechnikov kernel; see Silverman (1986) for more information. For comparison, a 
sample kernel curve and a sample normal curve are shown below. 


The input is: 


USE SURVEY2 
BEGIN 
DEN . * INCOME * TOTAL / KERNEL XREV YREV ZMAX=10, 
LOC=-3IN, QIN 
DEN . * INCOME * TOTAL / NORMAL  XREV  YREV, 
LOC-3IN,O0IN 


END 


The output is: 


Count 


©; D 
A% § 


Kernel and Normal Contours 


You can use the Contour option to collapse a bivariate density into a 2-D display. Use 
the Tick option to cut the z axis in more places than by default (and, thus, display more 
contours). We do this for both the nonparametric kernel density and the bivariate 
normal density. 
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The input is: 
USE SURVEY2 
— INCOME * TOTAL / KERNEL CONTOUR XREV 
apd YREV ZTICK-20 LOC=-3IN, 0IN 
DEN . * INCOME * TOTAL / NORMAL CONTOUR XREV, 
YREV ZTICK-20 LOC-3IN,0IN 
END 
Adding the Tile Option 
Use the Tile option to collapse a bivariate density into a 2-D display. 
The input is: 
USE SURVEY2 
BEGIN 


DEN . * INCOME * TOTAL / KERNEL CONTOUR XREV, 
YREV ZTICK=20 TILE CUT-40, 
LOC--3IN, OIN LEGEND-NONE 
DEN . * INCOME * TOTAL / NORMAL CONTOUR XREV, 
YREV ZTICK-20 TILE CUT-40, 
LOC-3IN, OIN 


END 
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Stratifying and Rotating the Displays 


You stratify bivariate density 3-D displays using grouping variables. Here we stratify 
the sample by sex. 


The input is: 


USE SURVEY2 
DEN . * INCOME * TOTAL / KERNEL  GROUP-SEX$  XREV, 
YREV SURFACE=XYCUT, 
ZMIN-0 ZMAX-15 


The output is: 


After these 3-D displays appear on the screen, you can rotate them as you can for single 
frame displays. 
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4 
Quantile and Probability Plots 


Leland Wilkinson 


Quantile plots and probability plots are useful for studying the distribution of a 
variable. Quantile Plot produces quantile plots, or Q plots. Unlike probability plots, 
which compare a sample to a theoretical probability distribution, a quantile plot 
compares a sample to its own quantiles (a one-sample plot) or to another sample (a 
two-sample, or Q-Q, plot). The quantile of a sample is the data point corresponding 
to a given fraction of the data. Probability Plot plots the values of a variable against 
the corresponding percentage points of a theoretical distribution like normal, chi- 
square, t, F, uniform, binomial, logistic, exponential, gamma, Weibull, Gompertz, 
Gumbel or Studentized range, etc. Graphs like this are called probability plots, or P 
plots. You can also plot the expected values of one variable against those of another 
(P-P plot). SYSTAT can produce probability plots for 38 probability distributions, 
discrete and continuous together. 
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Sample Displays 


One-sample quantile plot 


60 
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Quantile Plot Dialog Box 


Quantile plots are useful for studying the distribution of a variable. The quantile of a 
sample is the data point corresponding to a given fraction of the data. To plot each 
variable against its quantiles, select one or more x variables, or, to plot quantiles of one 
variable against those of another, select an x variable and a y variable. (This second 
type of plot is called a two-sample, or Q-Q, plot.) 

A one-sample quantile plot looks like a cumulative sample distribution function. A 
sample from a normal distribution, for example, should plot in an 5, or ogive shape. A 
sample from a uniform distribution should plot approximately as a straight line. A 
sample from a skewed distribution should plot as an asymmetric function. A two- 
sample quantile plot relates two sample distributions. It should look approximately like 
a straight line if the two samples follow the same distribution. 
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To open the Quantile Plot dialog box, from the menus choose: 
Graph 
Distribution Plots 
Quantile Plot... 


[^ Graph:Distribution Plots:Quantile Plot 


Б: 2 22 NW o 


Available variable(s]: 
COUNTRY$ 
POP 1983 
POP 1986 
POP 1990 
POP. 2020 
URBAN 
BIRTH 82 
BIRTH RT 
DEATH 82 
DEATH RT 
ВАВҮМТ82 
BABYMORT 
LIFE EXP 
БМР 82 
GNP. 86 saa | |_| Mirror (Dual) 
GDP_CAP 2 

LOG GDP | |<-Ветоуе | 


[ ]MultiPlot 


(Univariate density display on border = Histogram 


Overlay multiple graphs into a single frame 


X and Y variable(s). Variables to be plotted. You must select an x variable. The у 
variable is optional. Select more than one x or y variable to obtain side-by-side or 
overlaid plots. 
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Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames or overlaid in a single 
display. 


Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


MultiPlot. Creates a table of two-dimensional quantile plots with grouping variable 
categories appearing along the top and left of the table and X-variable and Y-variable 
categories appearing along the right and bottom. This orientation is similar to a Trellis 
display. 


Univariate density display on border. For 2-D plots, you can produce plots bordered 
by histograms, gap histograms, frequency polygons, fuzzygrams, box plots, notched 
box plots, dit or dot displays, jittered dot density, density stripes, kernel and normal 
curves. 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side by side. Different symbols distinguish separate plots or 
subpopulations. 


In addition, you can change coordinate systems, add a smoother, plot the residuals from 
a smoother, and customize the layout, axes, and appearance of the plots. 


Probability Plot Dialog Box 


Probability Plot offers 38 distributions against which you can plot a single variable: 
uniform, beta, logit normal, triangular, logistic, double exponential, t, Cauchy, 
Gumbel, normal, gamma, chi-square, Gompertz, lognormal, F, Weibull, Pareto, 
Rayleigh, Inverse Guassian, exponential, Studentized range, Erlang, half-normal, 
loglogistic, non-central chi-square, non-central F, non-central t, smallest extreme 
value, studentized maximum modulus, discrete uniform, Poisson, binomial, negative 
binomial, geometric, hypergeometric, Zipf, Benford's Law, logarithmic series. These 
distributions can be used for various purposes. For example, if you select the normal 
distribution, the resulting display can be used to screen data or residuals for non- 
normality or for the presence of outliers. The observed data values are plotted along 
the horizontal axis. The expected values for each observation are plotted on the vertical 


axis. 
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If the data are from a normal distribution, the plotted values fall approximately 
along a straight line extending from the lower left corner of the plot toward the upper 
right corner. For a sample of size 4, the expected value of the smallest observation from 
a normal distribution is 1.03, the next smallest is —0.30, the next to the largest is 0.30, 
and the largest is 1.03. For a sample of size 20, the expected values are —1.87, 1.41, 
—1.13, ..., 1.13, 1.41, and 1.87. SYSTAT estimates the expected normal values 
corresponding to each observation as the standard normal value corresponding to the 
probability (r —0.5)/n. For more information, see Chambers, Cleveland, Kleiner, and 
Tukey (1983). 


АП items work the same way, except that you can specify parameters for 30 of the 38 
distributions: 


BENFORD=B BETA-shplshp2  BINOMIAL=np ^ CHISQ = df 
DUNIFORM-N ERLANG-shpse ^ ENORMAL-loc, sc Е = dfl,df2 
GAMMA- shp, зс GEOMETRIC-p — GOMPERTZ-5,c НОВОМЕТКІС 
IGAUSSIAN-loc,sc — LLOGISIIC=logse,shpLNORMAL=loc, se LSERIES=theta 
NBINOMIAL=k, p NCCHISQ-d/delta | NCF-df1,df2,delta NCT=dfdelta 
PARETO=thr, shp POISSON-mean  RANGE=k,df RAYLEIGH-sc 
SMM-K df SEV-loc,sc t=df нонни 
WEIBULLe-sc, shp ZIPF=shp 


Instead of specifying a single variable, you can specify both an x variable and a y 
variable. If you specify both, the resulting P-P plot shows the expected values for each 
variable using a method proposed by Holmgren (1995). 
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To open the Probability Plot dialog box, from the menus choose: 
Graph 
Distribution Plots 
Probability Plot... 


Е Graph:Distribution Plots:Probability Plot 


COUNTRY$ 
POP 1983 
POP. 1986 
POP 1890 
POP. 2020 
URBAN 

BIRTH. 82 
BIRTH RT 
DEATH. 82 
DEATH. RT 
BABYMT82 
yi f ECT) о О Мойра 
GNP 82 | Multiplot 

GNP. 86 


Distribution 


[wma — M 


[Г] Univariate density display on border нида v 
[ Display scale in probability units [Overlay multiple graphs into а single irane 


X and Y variable(s). Variables to be plotted. You must select an x variable. Optionally, 
you can select a variable for the y axis. Select more than one x or y variable to obtain 
side-by-side or overlaid plots. 


160 


Chapter 4 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames or overlaid in a single 
display. 

Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


MultiPlot. Creates a table of two-dimensional probability plots with grouping variable 
categories appearing along the top and left of the table and X-variable and Y-variable 
categories appearing along the right and bottom. This orientation is similar to a Trellis 
display. 

Distribution. If you are plotting against a theoretical distribution, you can specify the 
type of distribution. Options include uniform, beta, logit normal, triangular, logistic, 
double exponential, t, Cauchy, Gumbel, normal, gamma, chi-square, Gompertz, 
lognormal, F, Weibull, Pareto, Rayleigh, Inverse Guassian, exponential, Studentized 
range, Erlang, half-normal, loglogistic, non-central chi-square, non-central F, non- 
central t, smallest extreme value, studentized maximum modulus, discrete uniform, 
Poisson, binomial, negative binomial, geometric, hypergeometric, Zipf, Benford's 
Law, and logarithmic series. Distributions are not available when a y variable is 
selected. 


Univariate density display on border. For 2-D plots, you can produce plots bordered 
by histograms, gap histograms, frequency polygons, fuzzygrams, box plots, notched 
box plots, dit or dot displays, jittered dot density, density stripes, kernel and normal 
curves. 


Display scale in probability units. Changes the scale to probability units (expected 
fractions of the selected distribution), 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side by side. Different symbols distinguish separate plots or 
subpopulations. 


In addition, you can change coordinate systems, add a smoother, plot the residuals from 
a smoother, and customize the layout, axes, and appearance of the plots. 
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Quantile and Probability Plot Options 


Bi Graph: Distribution Plots: Probability Plot 


‚ AlAxes | Layout | Leoend [Col [Fi 
. Mai Options | Smoother | Residuals { Соо 


Connectors/pattitions 
[Г] Line connected in case order 


©) Sample (Е) о [ose ] [Г] Traveling salesman path 


O сопе) pos ay | | E] Minimum spanning tree 

E] Vertical spikes to Y 
[I Confidence kemel p: [15527 28] | [Vector ines fom: 
[Г] Convex hull around all points „В 
influence on correlation coefficient E Ü 


Overlapping data - | 
© Points overlap Е Delaunay triangulation 
© Slight tandon йе [Voronoi tessellation 


Sunflower symbols [Use hexagonal binning 
S 5 Number of hex giid cuts! 


The following options are available: 

Confidence ellipse. Draws Gaussian bivariate ellipses for the sample in each plot or 
Gaussian bivariate confidence intervals on the centroid. The difference between Ell 
and Elm is analogous to the standard deviation versus the standard error of the mean. 
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m Sample. With Ell, the resulting ellipse is centered on the sample means of the x and 
y variables. The unbiased sample standard deviations of x and y determine its major 
axes and the sample covariance between x and y, its orientation. You can choose 
the size of the ellipse by specifying a probability value between 0 and I(both 
exclusive). If you make an extremely large ellipse (0.99), it may extend beyond the 
axes of your plot. The default is 0.6827. 

m Centroid. As with Ell, the ellipse produced by Elm is centered on the sample means 
of the x and y variables. The unbiased sample standard deviations of x and y 
determine its major axes and the sample Pearson correlation between x and y 
determines its orientation. You can choose the size of the ellipse by specifying a 
probability value between 0 and 1(both exclusive). This size is adjusted by the 
sample size so that the ellipse is always smaller than that produced using Ell. The 
default is 0.95. 

Confidence kernel(p). Nonparametric kernel density estimator is analogous to a 

continuous histogram that shows where the data are most concentrated in the sample. 

You can specify a probability value between 0 and 1(both exclusive). 


Convex hull around all points. You can draw a convex hull around all the points in the 
scatterplot. 

Influence on correlation coefficient. Makes the size of the plot symbol represent the 
extent of influence each point exerts on the Pearson correlation coefficient. A scale to 
the right of the plot helps you judge the extent of influence. (The influence of a point 
is the amount the correlation would change if that point were deleted.) 


Overlapping data. When many points overlap on a scatterplot, you can add a slight 
random jitter to make the points easier to distinguish, or draw a sunflower plot, where 
each symbol or "sunflower" represents one or more cases. Sunflowers are lighter or 
darker depending on the number of cases. Only nine symbols are possible, so larger 
counts are plotted with a filled circle. 


Connectors/partitions. Draw lines connecting or partitioning points in a number of 
ways. 
W Line connected in case order. Does just what it says. 


W Travelling salesman path. Tries to find the shortest possible closed path that 
connects all the points, with no repetitions. 


п Minimum spanning tree. Connects a set of points in a manner which minimizes the 
sum of the lengths of the connecting line segments. 


W Vertical spikes to Y. Draws spikes to a specified value in the y plane. 
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Vector lines from. Connects each point to a single point that you specify. 
Delaunay triangulation. Partition the nontriangular polygons of a Voronoi 
tessellation into triangles by joining the vertices of the Voronoi polygons. 

m Voronoi tessellation. Also known as the Dirichlet tessellation or the Thiessen 
diagram, this option produces straight boundaries halfway between points. This 
option presumes you have equivalent distances on the x and y axes. 

W Use hexagonal binning. You can select this check box to apply hexagonal binning 
for the plot. 

m Number of hex grid cuts. You can select use hexagonal binning check box to 
specify the number of cuts in the grid of hexagonal binning from this edit box. The 
maximum number of cuts is 50 and the minimum is 2. 


Using Commands 


After specifying the data file with USE filename: 


For a one-sample quantile plot QPLOT x-varlist / options 

For a two-sample (Q-Q) plot QPLOT y-var * x-var / options 

For a one-sample probability plot PPLOT x-varlist / distribution options 
For a two-sample (P-P) plot PPLOT y-var * x-var / options 


Replace distribution with BETA, BINOMIAL, CHISQ, DUNIFORM, F, GAMMA, 
GEOMETRIC, GOMPERTZ, HGEOMETRIC, NBINOMIAL, IGAUSSIAN, LNORMAL, 
ENORMAL, WEIBULL, RANGE, POISSON, PARETO, RAYLEIGH, t, TRIANGULAR, ZIPF, 
NORM, CAUCHY, LOGISTIC, UNIFORM, EXPO, DEXP, GUMBEL, BENFORD, LSERIES, 
ERLANG, HNORMAL, LLOGISTIC, NCCHISQ, NCF, NCT OR SMM. 


Examples 


Example 1 
Quantile Plots (One-Sample) 


А one-sample quantile plot compares а variable to its own quantiles. The following is 
a quantile plot of GOVSLRY, the 1992 governors’ salaries in thousands of dollars. The 
USSTATES data file contains one case for each U.S. state. 
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The input is: 


USE USSTATES 
QPLOT GOVSLRY / FILL=1 


The output is: 


10 mr 


> / 


j 07 

Gos 

205 

o 

$04 

КЕ 4 
02 


0.1 E) 
0.0 а sd ws 
0 50 100 150 
GOVSLRY 


The plot is slightly S-shaped, indicating that the distribution is close to normal. The 
smallest value is $35,000 (Bill Clinton's salary for the year in which he ran for 
president), while the largest value is Mario Cuomo's salary as the governor of New 
York, $130,000. 


Reading Quantiles 


“Quantile” is an alternative name for percentile when fractions, rather than 
percentages, are used. For example, the 0.5 quantile is the 50th percentile, the median, 
or the second quartile. Thus, quantile plots are useful for describing the distribution of 
a variable. 

The plot of the governors' salaries is shown again below, with the addition of a grid 
to enable you to read the plot scales more easily. 


The input is: 


USE USSTATES 
QPLOT GOVSLRY / YTICK-10 XGRID YGRID FILL=1 
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The output is: 


0.0 a— i зе Е 
0 100 150 
GOVSLRY 


To read the median salary of the U.S. governors, start at 0.5 on the vertical axis and 
follow its horizontal grid line to the right until the line intersects a data point. Then, 
imagine dropping a vertical line to the horizontal plot scale below. This value is the 
median—approximately 80, or $80,000. In a similar manner, starting at 0.1, the 
governor’s salary in 10% of the states is less than approximately $65,000. At 0.9, only 
10% of states have governors earning more than $105,000. 


Example 2 
Quantile Plots (Т\ wo-Sample) 


A two-sample quantile plot can reveal the area of the distribution where the samples 
differ the most. These are sometimes called Q-Q, or quantile-quantile, plots. 

The following is a Q-Q plot of LIFEEXPF (life expectancy for females) against 
LIFEEXPM (life expectancy for males) from the OURWORLD data file. These data 
were obtained from a United Nations report and contain one case for each of 57 


countries. 


The input is: 
USE OURWORLD 
QPLOT LIFEEXPF * LIFEEXPM / XGRID YGRID FILL=1, 
XLAB='Male Life Expectancy', 
YLAB='Female Life Expectancy’ 
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The output is: 


8 


8 


8 


Female Life Expectancy 
S 


Example 3 
Transformations 


By using SYSTAT's feature for transforming data, it is easy to assess results from a 
series of power transformations of the form y^ . This example uses the USSTATES data 
file and displays quantile plots of 1990 population values (for each of the 50 U.S. 
states) using lambda equal to 1 (no transformation), 0.5 (square root), 0.25, and 0 (log). 


The input is: 
USE USSTATES 
BEGIN 
QPLOT POP90 / FILL-1 TITLE-'No transformation', 
LOC-OIN,OIN 
QPLOT POP90 / XPOW-.5 FILL-1 TITLE-'Power = 0.5', 
LOC=5IN, QIN 


QPLOT POP90 / XPOW=.25 FILL=1, 
TITLE-'Power = 0.25',LOC=OIN, -61N 
QPLOT POP90 / XLOG FILL-1 TITLE-'Log' , LOC=SIN, -6IN 
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The output is: 


No transformation Power = 0.5 


The distribution for the data that are not transformed (top left plot) is very right- 
skewed. As the values of lambda change from 1.0 to 0, the distribution becomes more 
symmetric, and its shape approaches an S. For analyses that require the assumption of 
normality, it is preferable to use POP. 1990 in log units (bottom right plot). 


Example 4 
Normal Probability Plot 


This example uses the OURWORLD data file. The following are normal probability 
plots of POP. 1990 (1990 population in millions) for the data as recorded after a log 
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base 10 transformation (to determine whether logging improves normality for the 
data). The midrange smoother is added to assess how closely the plot points follow a 
line connecting the first and third quartiles. 


The input is: 


USE OURWORLD 
BEGIN 
PPLOT POP 1990 / NORMAL SMOOTH=MIDRANGE, 
YLAB-'Expected Value' FILL-1, 
TITLE-'No Transformation', 
LOC--1IN,OIN SIZE-1.5 
PPLOT POP 1990 / NORMAL SMOOTH=MIDRANGE XLOG, 
PSCALE YLAB-'Expected Fraction of Data', 
FILL-1 TITLE-'Transformation and Pscale', 
LOC=5IN, OIN SIZE-1.5 


END 


(You could omit NORMAL, which is the default.) 
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The output is: 


No Transformation 


39 LJ 100 150 20 
РОР 1990 

If the data follow а normal distribution, the values will fall approximately along a 

straight line. The data in the plot on the left clearly do not. The plot on the right shows 

that the log transformation clearly improves normality. Thus, for analyses based on 

normal theory, POP. 1990 should be analyzed in log units. 


Detrended Probability Plots 


us fit a straight line to the points within each plot 
line. As in the normal probability plots shown 
o transformation and the other with a log 


Instead of a midrange smoother, let 
and examine the residuals from the 
previously, we create two charts, one with n 
transformation on the x axis. 


The input is: 


USE OURWORLD 


BEGIN 
PPLOT POP. 1990 / RESID-LINEAR YLIMIT=0, 
SMOOTH-LOWESS BORDER-STRIPE, 


YLAB-'Standardized Residual' FILL-1, 
LOC--1IN, 0IN SIZE-1.4 
PPLOT POP 1990 / RESID-LINEAR YLIMIT=0, 
SMOOTH=LOWESS BORDER=STRIPE XLOG, 
YLAB=' Standardized Residual', 
FILL=1 LOC-5IN, OIN SIZE-1.4 
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П 10 
РОР 1990 


Barbados, the smallest country, exerts great influence on the line of best fit for the 
logged data. Consequently, we prefer the midrange smoother above. 


Example 5 
Normal Random Data Generation 


In this example, we used the SYSTAT normal random number generator, ZRN, to 
produce 150 normal random numbers. You can try some random number seeds other 
than 1339, which is shown in the commands. You can generate the RAND data file. 


The input is: 


NEW 

REPEAT 150 
RSEED 1339 
LET Z - ZRN() 
ESAVE RAND 


To produce the normal probability plot of the data, first read the RAND data file. 


The input is: 


USE RAND 
PPLOT z / FILL-1 
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The output is: 


S 
~. 


Normal 0.0, 1 0) Quantile 
т т + 


^ 
T 
. 


Example 6 
Symbol Size 


You can use the values of a variable in your file to control the size of plot symbols. 
Suppose that you have two variables and you want to see whether the marginal 
distribution of one of them is related in some way to the marginal distribution of the 
other. You should usually use empty symbols with this type of plot, since filled 
symbols can occlude each other and make the plot difficult to interpret. 

In this example, we are interested in crime rates for violent acts (VIOLRATE in the 
USSTATES data file) and whether their distribution is consistent across the 1992 voter 
turnout. We request a normal probability plot of VIOLRATE and use information about 
the percentage voting to scale the size of the plot points. VOTE 92 can serve as a 
bubble-sizing variable. 


The input is: 


USE USSTATES 
PPLOT VIOLRATE / NORMAL SIZE-VOTE 92 LEGEND-NONE, 
FILL-1 
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The output is: 


Normal 0.0, 1.0) Quantile 
pia Me 
~ 

» 
EN 
* 


L3 
T 


Notice that the size of the bubbles is not randomly distributed across the values of 
VIOLRATE. The three states with the highest rates of violent crime (NY, FL, and CA) 
at the top right have relatively low voter turnouts when compared with the states at the 
lower left (ME, MN, and WI) with low crime rates. The slight arc of the bubbles 
indicates that the distribution of VIOLRATE is moderately right-skewed. 


Example 7 
Chi-Square Probability Plot 


A chi-square variable is the sum of the squares of one or more normal variables. In this 
example, we plot the data against the quantiles of a chi-square distribution, specifying 
the degrees of freedom (df) to be four. 

Where can you get chi-square data to analyze? Typically, sums of squares in 
analysis of variance and Mahalanobis squared distances in multivariate analysis are 
chi-square-distributed when the normality assumptions are appropriate—see, for 
example, Gnanadesikan (1977) or Winer, Brown and Michels (1991). If you have a 
2x 2x 2x 2... analysis of variance, for example, the sums of squares for all effects 
are single-degree-of-freedom chi-square variables. Probability plots show you which 
effects stand out. 

The following example plots Mahalanobis distances of the countries from the 
centroid of all European countries on GDP_CAP (gross domestic product per capita), 
MIL (military expenditure per person), B. TO D (birth to death ratio), and LITERACY 
(literacy rate). We computed these distances using the Matrix procedure. 
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The input is: 


USE OURWORLD / MATRIX = 777 

MSELECT GROUP - 1 

MLET (GDP_CAP, MIL) =L10(@) 

MAT x = zzz(; GDP_CAP, MIL, B_TO_D, LITERACY) 
MAT x = x - COLMEAN(x) 

MAT dsquare = trp (DIAG (x* INV (COVA (x) ) *TRP (x) )) 
MSAVE dsquarel 

USE OURWORLD 

IF (Group$-'Europe') THEN LET CS$-COUNTRYS 
DSAVE DSQUARE2 

MERGE DSQUARE2 (C$) DSQUARE1 

DROP ROWNAMES 

DSAVE DSQUARE 


The Mahalanobis distances are saved under the name DIAGONAL. The distances have 
four degrees of freedom because they are distances squared and summed over four 
variables. 


You can now request the plot. 


The input is: 


USE DSQUARE 

BEGIN 

PPLOT DIAGONAL / CHISQ-4 SMOOTH-MIDRANGE FILL-1, 
TITLE-'Midrange Smoother' LOC--1IN,OIN, 
YLAB-'Expected Value for Chi-square (4)' 

PPLOT DIAGONAL / CHIS LABEL-C$  PSCALE FILL-1, 
TITLE-'Probability scale with country names', 
LOC-4.51N,0IN CSIZE-1.2, 
YLAB-'Expected Value for Chi-square (4)' 
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The output is: 
Midrange Smoother Probability scale with country names 
i | 
42 
б б 
5 5 
E 8 
E > 
|, k 
15 
The point furthest from the midrange line is Portugal. 
Example 8 


Exponential and Gamma Probability Plots 


You can plot your data against quantiles of an exponential or gamma distribution. The 
exponential distribution function is: 


(4) 

Ду) = 1-е 
where s is a spread parameter. In the transposed probability plot, the slope of the line 
through the plotted points is an estimate of s. 

A gamma distribution is a transformed chi-square with real degrees of freedom. You 
must specify a shape parameter. Chambers, Cleveland, Kleiner, and Tukey (1983) 
discuss applications of gamma probability plots in univariate models, and 
Gnanadesikan (1977) shows how to use them for analyzing multivariate data. 

The following are some data from an unpublished memo by Taylor cited in Maltz 
(1984) and elsewhere. The data (in the file PAROLE) record the number of Illinois 
parolees observed to have failed conditions of their parole each month after release. 
(Each case in the file has one entry for MONTH and one for COUNT.) An additional 
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149 parolees were observed to have failed after 22 months, but the data are not graphed 
beyond this point. 


MONTH COUNT MONTH COUNT 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 


w 


шороо оо о мю оо 


— O c0 -00 uU RUIT-— 


— = 


The following аге exponential and gamma probability plots of these data. Notice that 
we weight the cases to include replicate values. 


The input is: 


USE PAROLE 
BEGIN 
FREQ COUNT 
PPLOT MONTH / TRANS EXPO PSCALE, 
YLAB-'Expected Fraction of Data'  FILL-1, 
TITLE-'Exponential probability plot', 
=-1IN, QIN SIZE-1.3 
PPLOT MONTH / TRANS САММА=.5  PSCALE, 
YLAB-'Expected Fraction of Data' FILL-1, 
TITLE-'Gamma probability plot', 
LOC=5IN, OIN SIZE-1.3 
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The output is: 
Exponential probability plot Gamma probability plot 

35, 25, 
2 Di 
L P 

B E 

20 $ 
$| s| 
000 020 040 0600 ою 1000 000 ою 040 000 ово 1.00 

Expectrd Fraction of Data Expected Fraction of Data 


The exponential probability model appears to fit poorly in the tails (especially in the 
lower tail). In the gamma plot, the recidivism data are plotted against a gamma 
distribution with shape parameter equal to 0.5. 
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Scatterplots 


Leland Wilkinson 


Scatterplot (PLOT command) produces bivariate scatterplots, 3-D scatterplots, and 
other plots of continuous variables against each other (or against a categorical 
variable). Various options allow you to incorporate confidence ellipses for the sample 
or centroid, kernel density estimators, convex hulls, Voronoi tessellations, Delaunay 
triangulations, minimal spanning trees, the traveling salesman algorithm for the 
shortest path connecting points, or a vector option that connects each point to a 
specific point. You can also request a bubble plot, an influence plot, a multiplot, or a 
high-low-close display. 

You can rotate 3-D plots and also fine tune displays by using the Graph Properties 
dialog box, which you can get by right-clicking the graph in the Graph Editor and then 
selecting the Properties option. This dialog allows you to change plot shape, range 
limits for axes, values of a tension parameter for a smoother, and so on. 

You can add a regression line with an optional confidence band or smoother (19 
for 2-D displays, 10 for 3-D displays). Another option generates residuals 
automatically. 

You can plot subpopulations in separate frames or overlay them in a single display. 
When groups are specified, features such as smoothers, ellipses, and hulls are 
displayed separately for each group. You can specify symbols to identify group 
membership or label each case (point) with a unique name (up to 12 characters). 

SYSTAT provides options to plot in polar, cylindrical, spherical, and triangular 
coordinates. 
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Sample Displays 
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PETALLEN 


E 
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Scatterplot Dialog Box 


Scatterplot creates a wide variety of bivariate scatterplots, 3-D scatterplots, surface 
plots, and contour plots. 


To open the Scatterplot dialog box, from the menus choose: 


Graph 
Scatterplot... 


Graph: Scatterplot 


Y-variable(s) 
В | «Required» [ ]Repeated trial 


BIRTH, 82 
BIRTH. RT 
DEATH 82 
DEATH. RT 
BABYMT82 — — Dune 
BABYMORT i 30 

LIFE, EXP 
GNP 82 [_] Mirror [Dual] 


GNP_86 f = yay 
GDP ГАР я | <- Remove |_| MutiPlot 


| Matrix columns 


[C] Univariate density display on border Нлл 


Overlay multiple graphs into a single frame 
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The structure of a scatterplot depends on the variables that you select. For example, 
select an x variable and a y variable to obtain a bivariate scatterplot, or add a z variable 
to obtain a 3-D plot. 

Options allow you to add a regression line with optional confidence band, 
ellipses, hulls, lines, and partitions. You can also plot residuals, or plot polar, 
cylindrical, spherical, and triangular coordinates. For example, you can create two- 
and three-dimensional triangle plots that display three variables in two dimensions, 
or four variables in three dimensions. 


X, Y, and Z variable(s). Variable(s) you want to plot. 

Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames or overlaid in a single 
display. 

Repeated trials. Select two or more continuous y variables that are measured on the 
same scale and select Repeated trials to plot the values of the selected variables. Trial 
names are displayed on the x axis. 

Matrix columns. Creates a display where x is the variable index, y is the case index, 
and z is the data value. Select one or more z variables without selecting an x or y 
variable to enable this option. 

Display as. If you select a z variable, you can choose between a 3-D scatterplot, a 2-D 
contour plot, or a 2-D mosaic plot. 


Mirror (Dual). Displays subpopulations in a single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 
MultiPlot. Creates a table of two-dimensional scatterplots with grouping variable 
categories appearing along the top and left of the table and X-variable and Y-variable 
categories appearing along the right and bottom. This orientation is similar to a Trellis 
display. 

Univariate density display on border. For 2-D plots, you can produce plots bordered 
by histograms, gap histograms, frequency polygons, fuzzygrams, box plots, notched 
box plots, Dit or Dot displays, jittered dot density, density stripes, kernel and normal 
curves. 

Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side-by-side. Different symbols distinguish separate plots or 
subpopulations. 
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In addition, you can add a smoother and customize the layout, axes, and appearance of 
the graphs. 


Plot Options 


& Graph: Scatterplot 


Confidence ellipse Connectors/pattitions 
© None (Line connected in case order 


| OSample(El] р: [Г] Traveling salesman path 
| OCentioid Em) р: [095 2] [Г] Minimum spanning tree 
[Г] Vertical spikes to Y i 


[Г] Confidence кете! p: |0 5927 ES усас lines fom: 

[Г] Convex hull around all points 

C Influence on correlation coefficient 
Overlapping data 
© Points overlap [Г] Delaunay triangulation 
© Slight random jitter E Voronoi tessellation 


© Sunflower symbols C Use hexagonal binning _____ 
Number ofhex gid cuts) [25 | 


rau 


Confidence ellipse. Draws Gaussian bivariate ellipses for the sample in each plot or 
Gaussian bivariate confidence intervals on the centroid. The difference between Ell 
and Elm is analogous to the standard deviation versus the standard error of the mean. 
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m Sample (Ell). The resulting ellipse is centered on the sample means of the x and y 
variables. The unbiased sample standard deviations of x and y determine its major 
axes and the sample covariance between x and y, its orientation. You can choose 
the size of the ellipse by specifying a probability value between 0 and 1 (both 
exclusive). If you make an extremely large ellipse (0.99), it may extend beyond the 
axes of your plot. The default is 0.6827. 

m Centroid (Elm). As with Ell, the ellipse produced by Elm is centered on the sample 
means of the x and y variables. The unbiased sample standard deviations of x and 
y determine its major axes and the sample Pearson correlation between x and y 
determines its orientation. You can choose the size of the ellipse by specifying a 
probability value between 0 and 1 (both exclusive). This size is adjusted by the 
sample size so that the ellipse is always smaller than that produced using Ell. The 
default is 0.95. 

Confidence kernel(p). Nonparametric kernel density estimator analogous to à 

continuous histogram that shows where the data are most concentrated in the sample. 

You can specify a probability value between 0 and 1 (both exclusive). 


Convex hull around all points. You can draw a convex hull around all the points in the 

scatterplot. 

Influence on correlation coefficient. Makes the size of the plot symbol represent the 

extent of influence each point exerts on the Pearson correlation coefficient. A scale to 

the right of the plot helps you judge the extent of influence. (The influence of a point 

is the amount the correlation would change if that point were deleted.) 

Overlapping data. When many points overlap on a scatterplot, you can add a slight 

random jitter to make the points easier to distinguish, or draw a sunflower plot, where 

each symbol or “sunflower” represents one or more cases. Sunflowers are lighter or 

darker depending on the number of cases. Only nine symbols are possible, so larger 

counts are plotted with a filled circle. 

Connectors/partitions. Draw lines connecting or partitioning points in a number of 

ways. 

m Line connected in case order. Does just what it says. 

m Traveling salesman path. Tries to find the shortest possible closed path that 
connects all the points, with no repetitions. 

m Minimum spanning tree. Connects a set of points to minimize the sum of the 
lengths of the connecting line segments. 

m Vertical spikes to Y. Draws spikes to a specified value in the y plane. 
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= Vector lines from. Connects each point to a single point that you specify. 

п Delaunay triangulation. Partitions the non-triangular polygons of a Voronoi 
tessellation into triangles by joining the vertices of the Voronoi polygons. 

m Voronoi tessellation. Also known as the Dirichlet tessellation or the Thiessen 
diagram, this option produces straight boundaries halfway between points. This 
option presumes you have equivalent distances on the x and y axis. 


Use hexagonal binning. You can select this check box to apply hexagonal binning for 
the 2-D plot. 


m Number of hex grid cuts. When the Use hexagonal binning check box is selected, 
you can specify the number of cuts in the grid of hexagonal binning from this edit 
box. The maximum number of cuts is 50 and the minimum is 2. The default value 
is 25. 
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Smoother 


Smoother provides 19 different types of smoothers to fit lines to two-dimensional 
displays. (Kriging is not shown.) 


Linear LOWESS 


DWLS Quadratic Log Power 


NEXPO Inverse 


Median 


Bisquare 
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Ten of the methods can also be used to add surfaces to three-dimensional displays. 


Quadratic 


Andrews, Bisquare, Huber, and Kriging are not shown. 
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The different types of smoothers are available in the Smoother tab of the Scatterplot 
dialog box. 


Wl Graph:Scatterplot 


| AllAxes || Layout 


| 
| 
Smoother method 


© None © Spline © Midrange -O Kriging 
OLnea © Step © Andrews ® Angle 
О Quadratic © МЕХРО © Bisquare 

O Log O Inverse © Huber 

О Power © Trimmed 

OLOWESS 

ODwLS 


Limit smoother domain to data range 


[Г] Confidence interval on regression line |2 25 a 


Tension. For the LOWESS, DWLS, inverse, mode, median, mean, and trimmed 
smoothing methods, you can specify the degree to which the line or surface is allowed 
to flex locally to fit the data. Specify a value between 0 and 1 (both inclusive). The 
default setting is 0.5, meaning that half the points are included in the running window. 


You can increase or decrease this value to increase or decrease the number of points in 


the window. 
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Linear. Fit E[y] = а + bx , where a is constant term and b is a slope coefficient. 


Quadratic. Fit E[y] = a + bx сх? where a is a constant and b and c are slope 
coefficients. 

Log. Fit E[y] = a + bln(x). 

Power. Fit Ely] = ax’. 

LOWESS. Produces a smooth curve by running along the x values and finding 


predicted values from a weighted average of nearby y values. The surface is allowed 
to flex locally to fit the data better. Use Tension to specify the amount of local flex. 


DWLS. Fits a line through a set of points by least-squares. Use Tension to specify the 
amount of local flex. 

Spline. Fit several sections of the curve with cubic equations 

(у = at+bx+ cx’ + dx’ , where a is a constant, and b, c, and d are coefficients). 
These curves are joined smoothly at “knots,” which usually coincide with the data 
points themselves. 


Step. Start at the point with the smallest x value and drop (or rise) to the point with the 
next larger x value. 


NEXPO. Negative exponentially weighted smoothing. NEXPO fits a curve through a 
set of points such that the influence of neighboring points decreases exponentially with 
distance. It is an alternative to Spline smoothing for interpolation, closely related to 
DWLS smoothing. 


Inverse. Inverse squared distance smoothing, similar to NEXPO except that no 
regression estimation is used. The height of the curve at a smoothing point is the 
weighted average of the y values at x values, where the weights are the squared 
Euclidean distances from the data points to the smoothing point on the x axis. Use 
Tension to specify the amount of local flex. This is sometimes called Shepard’s method 
of interpolation. 

Mean, Median, and Mode. Also known as moving average or running median, these 
options use the mean, median, or mode of the surrounding points. Use Tension to 
specify the size of the window. Mode is particularly useful to highlight the presence of 
subpopulations. 


Midrange. Draws a straight line between the points marking the first and third 
quartiles. 
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Andrews, Bisquare, Huber, and Trimmed. These methods use functions to 
downweight the influence of cases with extreme residuals on the estimates of a and b 
in the regression y = а+ bx. 


Kriging. Produces the best linear unbiased estimation of a stochastic process by 

generalized least-squares. You can specify either Angle (direction of the major axis of 
anisotropy; between 0 and 360, where the angle is 0 foran isotropic case), Order (where 
0 = stationary, 1 = Ist-order non-stationary, and 2 = 2nd-order non-stationary), or Ratio 
(ratio of the major/minor axis of anisotropy, where 1 is the ratio for an isotropic case). 


Limit smoother domain to data range. Limits the smoother to the range of the data. 
This option is available for two-dimensional displays. 


Confidence interval on regression line. Gives a confidence interval of specified value 
as an area between two regression lines.This option is available for two-dimensional 


displays. 
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Residuals 
Residuals plots the standardized residuals of the dependent variable. 


" Graph:Scatterplot 


s) Angle 


Order [0 


Ratio 


OLOWESS © Median 
Opwis Омо 


SYSTAT derives residuals for any of the 19 smoothers described previously. 
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High-Low-Close Plot 


To open the High-Low-Close Plot dialog box, from the menus choose: 
Graph 


Summary Charts 
High-Low-Close Plot... 


Iff Graph:Summary Charts:High-Low-Close Plot 


| c- Remove 


Add» 


Minor [Dual] 


—Yüverlay multiple graphs into a single frame 


OKK 
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High-low-close plots are most commonly used to track high, low, and closing stock 
market prices. A series of vertical bars displays the high-low range for a given time 
period, with a marker to indicate the closing value. 


High, Low, and Close variables. Variables that contain (respectively) the high, low, and 
close values for each case or time period. 


X-variable. Variable that defines the intervals or categories on the x axis. This variable 
can be numeric or string. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames or overlaid in a single 
display. 

Mirror (Dual). Displays subpopulations in а single frame with an upper and lower plot. 
To use this option, you should select a grouping variable with only two categories. 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side-by-side. Different colors, patterns, or symbols distinguish 
separate plots or subpopulations. 
4 
Сү Ne. 
In addition, you can change the coordinate Systems, add a smoother, and customize the 
layout, axes, and appearance of the graphs. 
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Scatterplots 


To create a graph, first tell SYSTAT what data to use: 


USE filename 


Then continue as follows: 


2-D scatterplots PLOT 
PLOT 
3-D scatterplots PLOT 
Surface plot PLOT 
Contour plots PLOT 
Triangular coordinates display PLOT 
PLOT 
Matrix PLOT 
Repeated trials PLOT 
Multiplot PLOT 
Examples 
Example 1 
Labels in Scatterplots 


varlist / options 


yvarlist 
zvarlist 


zvarlist 
zvarlist 


zvarlist 
wvarlist 


* var/options 

* yvar * xvar / options 

* yvar * xvar / SMOOTH-type 
* yvar * xvar / CONTOUR 


* yvar * xvar / TRI 
*zvar * yvar * xvar/ TRI 


varlist / MATRIX options 
yvarlist / REPEAT YPOW 
yvar * xvar / GROUP - gvarl,gvar2, 


MULTIPLOT 


In various international cities, how long must people work to earn enough to buy a Big 
Mac? How does this time relate to the length of a typical work week? We plot 

BIG MAC, the working time (in minutes) to buy a Big Mac against WORKWEEK, the 
length of the work week (in hours). The data are in the RCITY file that has 46 cases, 


one for each city. 


The input is: 
USE RCITY 


PLOT BIG MAC * WORKWEEK / LABEL 


=CITY$ FILL=1 CSIZE=1.2 
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The output is: 


With a quick glance at the plot, we question if the data for Mexico City are accurate. 
Does the average person in Mexico City have to work more than four hours to earn 
enough money to purchase a Big Mac? Notice that Hong Kong has the longest work 
week (45.7 hours); its average working time to purchase a Big Mac is 24 minutes. 


Zooming in on Points 


Most of the cities fall in the lower left corner of the plot. Limiting the scale range of 
both axes allows us to zoom in on plot points that fall within a certain range. 


The input is. 


USE RCITY 

PLOT BIG_MAC*WORKWEEK/LABEL=CITY$ , XMAX=40, YMAX=60, YMIN=15, 
YLABEL='Work Time (minutes) for Big Mac', 
XLABEL='Average Work Week (hours)', 
CSIZE=1.2,FILL=1 
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The output is: 


| 
| 
| 


© 
T 


S 
Te 
Ё 


Work Time (minutes) for Big Ма! 


15 
40 31 32 33 34 35 36 37 38 39 40 
Averaae Work Week (hours) 


Jittering Points 
You can add a small amount of uniform random error to the location of each point. 


The input is: 


USE RCITY 

PLOT BIG MAC*WORKWEEK/LABEL-CITYS ,XMAX-40, YMAX-60, YMIN-15, 
YLABEL-'Work Time (minutes) for Big Mac', 
XLABEL-'Average Work Week (hours) ',JITTER, 
CSIZE-1.2,FILL-1 


Each time we repeat this plot request, the random perturbation changes. 
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The output is: 
60 
E 
gs 
5 
242 
i 
2 
E24 
$ el “= 
z, *о 
30 31 32 33 34 35 36 37 38 39 40 
Average Work Week (hours) 
Example 2 
Categorical Variables = 


Using ће RCITY data, we plot the y variable WORKWEEK against the categorical 

x variable REGIONS, listing the regions (Africa, Europe, M. & S. America, N. 

America, or Pacific/Asia) associated with each city. For the second plot, we specify the 

order in which the regions appear on the x axis and also add a LOWESS smoother. 
USE RCITY 


BEGIN 
PLOT WORKWEEK * REGIONS / FILL=1 LOC--2.5IN,0IN 
ORDER REGIONS / SORT-'Europe', 'Africa', 
'N. America', 'M.&S.America', 
'Pacific/Asia' 
PLOT WORKWEEK * REGIONS / SMOOTH-LOWESS  YLAB-' ' 


FILL-1 LOC-2.5IN,0IN | 
END 
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The output is: 
50 ae 50 гт 
45 Е; 45 . 
x Ы . 
ш 
40 ci Ar 4 
| ЕЯ it 
= : eir 
\ |: MN И: | 
30 — —— 30 Um TOC e - 
SU ^d S “>” 
PU. > £. 
REGIONS REGIONS 


The LOWESS smooth draws a regression line connecting the weighted average of 
WORKWEEK values from region to region. 


Example 3 
Case Numbers in Scatterplots 


If we specify one variable as the y variable without specifying an x variable, SYSTAT 
plots each value against its case number. We use the OURWORLD data file, which 
contains 57 countries (cases) and includes variables such as BAB YMORT (the number 
of babies out of 1,000 who die within the first year of life). In the left plot, we connect 
the plot points with a dashed line. In the second plot we label the countries to see the 
countries corresponding to each case number. 


The input is: 
USE OURWORLD 


BEGIN 
PLOT BABYMORT / LINE DASH-11 HEI-2. 5IN WID-3.2IN, 
FILL-1  LOC-OIN,2IN 


PLOT BABYMORT / LABEL=COUNTRY$ HEI-2. 5IN WID-3.2IN, 
FILL-1 CSIZE=1.5 LOC-OIN,-1.5IN 


END 
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The first 20 cases are the European countries; the next 16, Islamic; and the last 22, New 
World. 


Example 4 
Transformations 


Do urban countries have more McDonald's restaurants? We select MCDONALD as the 
y variable and URBAN (the percentage of the population residing in cities) as the 
x variable. Many values are close to 0, so we move the plot points away from the axis 


borders. 


The input is: 


USE OURWORLD 
PLOT MCDONALD * URBAN / TICK-INDENT XMIN=0 FILL=1 
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The output is: 


GHP HP OPPS PS LH HS 
URBAN 


tries have many McDonald’s restaurants, so it’s hard to see if any 


Very few coun 
relation exists among the points scrunched at the bottom of the plot. We use the log and 


power transformations in an effort to improve the representation. 


The input is: 


BEGIN 
PLOT MCDONALD * URBAN / YPOW TICK-INDENT XMIN=0 FILL=1, 
TITLE-'Power' LOC--2.7IN,0IN 
PLOT MCDONALD * URBAN / YLOG TICK-INDENT XMIN=0 YLAB=' ', 
FILL-1 TITLE-'Log' LOC=2.7IN, 0IN 


END 
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The output is: 


1 
GHPPHP EPH HP AS CHP PLP PCH AS 
URBAN URBAN 


Except for the four points on the lower right of the log plot, the relation appears fairly 
linear. Now we add the names; then we use the first letter of the name of each country 
as the plot symbol. We'll also add a LOWESS curve. 


The input is: 


BEGIN 
PLOT MCDONALD*URBAN/ YLOG, LABEL=COUNTRY$, TICK=INDENT, FILL=1, 
XLAB='Percent of Population in Cities', 
YLAB="Number of McDonald's Restaurants", 
CSIZE-1.5,SIZE-1.2,LOC--3IN, OIN 
PLOT MCDONALD*URBAN/YLOG, SYMBOL=COUNTRY$ , SMOOTH=LOWESS , TICK=INDENT 
=' ', YLAB=' ',SIZE=1.2,LOC=3IN, 0IN 
END 


201 


Scatterplots 


The output is: 
EJ 
2 
S 100 
5 
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3 
22 
ЁЗ 
20 
za 
1 1 
10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 90100 
Percent of Population in Cities. 
Example 5 


Subpopulations and Grouping 


Suppose we want to compare female life expectancy and dollars spent per person on 
health for the three groups of countries. Use HEALTH as the x variable, LIFEEXPF as 
the y variable, and GROUPS as the grouping variable. Additionally, specify XLOG for 
the x axis and label the plot points by country. 


The input is: 


USE OURWORLD 
PLOT LIFEEXPF * HEALTH / XLOG GROUP-GROUPS  ROW-2, 
LABEL=COUNTRY$ CSIZE-1.5 FILL=1 
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The output is: 


Europe 


LIFEEXPF 
LIFEEXPF 


LIFEEXPF 


We can clearly see that European cities spend more on health and have higher female 
life expectancy rates than Islamic and New World countries. 


Displaying Groups in a Single Frame 
We overlay multiple graphs into a single frame to display the groups in a single plot. 


The input is: 


USE OURWORLD 

PLOT LIFEEXPF * HEALTH / XLOG GROUP=GROUP$ OVERLAY, XFORM=0, 
XLAB='Dollars (per person) for Health', 
YLAB='Female Life Expectancy' FILL=1, 
SYMBOL=4,5,1 COLOR=BLUE, BLACK, RED 
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The output is: 
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Dollars (per person) for Health 
Plotting Features by Subgroup 
When you use grouping variables and overlay multiple graphs into a single frame, plot 


features are drawn separately for each group. We add the LOWESS smoother and the 


Hull option. 
With commands, we can also specify characters to use as plot symbols with the 


SYMBOL option. In the following plots, we use the first letter of the grouping variable 
values (SYMBOL-*E ', ‘I *, ‘N °) as plot points. 


The input is: 


BEGIN 
PLOT LIFEEXPF*HEALTH/ XLOG, GROUP-GROUPS , OVERLAY, 
SYMBOL-'E','I','N' , SMOOTH=LOWESS, 
XLAB-'Dollars (per person) for Health', 
='Female Life Expectancy (in years)' 
LEGEND-NONE, COLOR-BLUE, GREEN, RED, 
LOC--3IN, OIN, TITLE- 'LOWESS smooths' 
XLOG, GROUP=GROUP$, OVERLAY, 
SYMBOL-'E', 'I', 'N' ,HULL, 
XLAB: ollars (per person) for Health', 
YLAB-'Female Life Expectancy (in years)', 
LEGEND-NONE, COLOR=BLUE, GREEN, RED, 
LOC-3IN, OIN, TITLE- 'Convex Hulls' 


PLOT LIFEEXPF*HEALTH/ 
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The output is: 
LOWESS smooths Convex Hulls 
H H 
i i 
a 
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“th 1 10 100 1000 
Dollars (per person) for Health 

Example 6 
Multiple Y Variables 


SYSTAT allows you to specify two y variables and plot them in a side-by-side display 
for easy comparison. Here we display female and male life expectancy (LIFEEXPF, 
LIFEEXPM ) against HEALTH. 


The input is: 
USE OURWORLD 


PLOT LIFEEXPF LIFEEXPM * HEALTH / XLOG XFORM=0 SYMBOL-21,20, 
SIZE-2 FILL=.75 


The output is: 


LIFEEXPM 
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Example 7 
Repeated Measures 


In this example, we examine expenditures (in U.S. dollars) that countries make on 
education (EDUC), health (HEALTH), and the military (MIL). Each country has three 
repeated measures and is identified as European, Islamic, or New World. We initially 
plot the repeated measures ignoring the group structure, and then we use GROUPS as 
the grouping variable to stratify the sample. 


The input is: 


USE OURWORLD 
PLOT EDUC MIL HEALTH / REPEAT YPOW FILL=1 
PLOT EDUC MIL HEALTH / REPEAT YPOW FILL=1 GROUP=GROUP$ ROW=1 


The output is: 
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Example 8 
Bubble Plots 


You can use the value of a variable to control the size of a plot. This is useful for 
representing a third variable in a two-dimensional plot. These plots are sometimes 
called bubble plots. Cleveland, Kleiner, McRae, and Warner (1976) use open circles to 
represent levels of pollution on a map of New England; Bickel, Hammel, and 
O'Connell (1975) use open squares to represent the size of university departments in 
plotting admissions data at the University of California, Berkeley. 

One caution: The size of the plot symbols is taken from the values in your variable. 
If a value is as small as 0.001 or is negative, the point will be invisible; if a value is as 
large as 100, the symbol will fill the entire plot. Try a range between 0 and 10. If your 
sizing variable does not lie in this range, you should rescale it. Finally, use empty 
symbols with this type of plot because filled ones can occlude each other and make the 
plot difficult to interpret. 

We plot MCDONALD against URBAN, using bubbles to represent population. 


The input is: 


USE OURWORLD 

STAND POP 1990 / RANGE 

LET POP 1990 - 5*POP 1990 

PLOT MCDONALD * URBAN / SIZE-POP 1990 SYMBOL=1 FILL=1  YLOG, 
LEGEND-NONE TICK=INDENT 


The output is: 


dL ELLE LL aL. 
10 20 30 40 50 60 70 80 90 100 
URBAN 
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The largest circle represents Brazil's population of 152 million. The countries above 
Brazil are Canada, the U.K., Germany, and France. The U.K., Germany, and France 
range in size from 56 to 62 million people. Notice that the populations of Turkey and 
Italy are similar (their circles are the same size). Turkey has 11 McDonald's restaurants 
with 45% of the population living in cities; Italy has 14 McDonald's with 69% living 
in cities. 


Example 9 
Three-Dimensional Scatterplots 


Specify an x variable, a y variable, and a z variable to plot three variables in a three- 


dimensional display. 
The following is a default plot of LITERACY (percentage ofthe population who can 
read) against a plane formed by DEATH. RT and BIRTH. RT. 


The input is: 
USE OURWORLD 


PLOT LITERACY * DEATH RT * BIRTH RT / FILL-1 


The output is: 


LITERACY 


There is little visual information in this display; however, we can add perspective to 

the plot with grid lines. We can also change the display of the graph border to better 

define the floor of the display, and we can draw vertical spikes from the points to the 
floor, Finally, we can control the height, width, and depth of the graph. 
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The input is: 
PLOT LITERACY*DEATH_RT*BIRTH_RT/XGRID, YGRID, AXES-SPOON, SPIKE, 
WIDTH-3.5IN,HEIGHT-2.5IN, ZMIN=0, 
ZMAX-100,ALT-2IN,XTICK-12,FILL-1 


The output is: 


100 
80 
60 
40 


20 


LITERACY 


Are there short spikes hidden behind the taller spikes? Now we'll reverse the scale of 
the x and y axes and rotate the display 180 degrees. 


The input is: 
PLOT LITERACY *DEATH RT*BIRTH RT/XGRID, YGRID, AXES=SPOON, SPIKE, 
XREV, YREV, WIDTH-3.51IN, 


HEIGHT-2.5IN,ZMIN-0,ZMAX-100, 
ALT=21N, XTICK=12, FILL=1 


The output is: 


LITERACY 
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Now we can see that LITERACY is lowest for countries with high birth and death rates. 


Adding an Inverse Smoother 


The addition of a surface to a 3-D display can help to better characterize the 
relationship among variables. You can add an inverse smoother. 


The input is: 


PLOT LITERACY*DEATH RT*BIRTH, RT/XGRID, YGRID, AXES-SPOON, SPIKE, 
XREV, YREV, SMOOTH=INVERSE, ZMIN=0, 
ZMAX-100,ALT-2IN,WIDTH-3.5IN, 
HEIGHT-2.5IN XTICK=12 


The output is: 


LITERACY 


What would the surface look like if we relaxed the tension? 


The input is: 


CY*DEATH. RT*BIRTH RT/XGRID, YGRID, AXES=SPOON, SPIKE, 
дш XREV, YREV, SMOOTH-INVERSE, 
TENSION=.2,CUT=40, ZMIN=0, ZMAX=100, 


ALT-2IN,WIDTH-3.5IN,HEIGHT-2.5IN, 
XTICK-12 
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The output is: 


LITERACY 


Example 10 
Bordered Displays and Transformations 


Here is an example using the OURWORLD data with 1990 population estimates 
(POP. 1990) and population projections for the year 2020 (POP. 2020 ) for 57 
countries. We use univariate density displays on the border to display population box 


plots outside the scatterplot. In the second graph, we transform the data to log base 10 
units. 


The input is: 
USE OURWORLD 
BEGIN 


PLOT POP 2020 * POP 1990 / BORDER FILL-1  LOC--3IN,OIN 
PLOT POP 2020 * POP 1990 / BORDER XLOG YLOG TICK=INDENT 


FILL-1  LOC-3IN,OIN 
END 
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The output is: 


POP 2020 


0 50 150 200 1 10 ' 


10 
POP 1990 


100 
POP 1990 


The univariate distribution of POP. 1990 is displayed in a box plot along the top left 
of the left display. The distribution is severely right-skewed. It has three far outside 
values (open circles) and five outside values (asterisks). The box plot for POP. 2020 is 
along the right side of the plot. It is also severely right-skewed. It has three far outside 
values and one outside value. The configuration of the points within the plot are 
clumped in the lower left corner with a loose upward diagonal scattering. These data 
are not suited for linear models or analyses based on normality. 

The box plots along the edges of the right plot indicate fairly symmetric 
distributions. Barbados is the lone small outlier for both distributions. These data are 


more suited for analysis in log units. 


Example 11 
Regression Line with Confidence Band 


SYSTAT can compute and draw the linear regression line y = a+ bx for the points 
in your plot. The Confidence interval on a regression line option allows you to specify 
confidence bands. You can specify any level you want (for example, 0.87 for 87% 
confidence). The default is 0.95 (95% confidence). 

SYSTAT draws upper and lower hyperbolic bands around the regression line. If the 
residuals (the differences between the estimated and observed values for y at each x ) 
are normally distributed, independent of each other, and have the same variance 
(spread), then 90 times out of 100, the confidence intervals constructed by SYSTAT 
(from data sampled in the same way as the current values) will cover the true regression 


line relating y to x. 
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In this example, we display 95% confidence intervals on the regression line of МП, 
(dollars countries spend per person on the military) on GDP. CAP (gross domestic 
product per capita). 


The input is: 


USE OURWORLD 
BEGIN 
PLOT MIL*GDP_CAP/SMOOTH=LINEAR, CONFI, FILL=1, LOC=-3IN, OIN, 
TITLE='Before transformations' 
PLOT MIL*GDP_CAP/SMOOTH=LINEAR, CONFI, XLOG, YLOG, FILL=1, 
LOC-3IN,0IN,TITLE-'After transformations' 
END 


The output is: 


100 1000 10000 
GDP CAP 


The point at the top left corner of the left display is Iraq. Libya is just slightly below it 
and to the right. After log transformations are made (right plot), Iraq and Libya remain 
at the top but are shifted closer to the fitted line. 


Example 12 
Influence Plot 


The influence of a point іп a scatterplot on the correlation coefficient is the amount that 
the correlation would change if that point were deleted. Plotting influences helps us 
determine whether a linear fit is relatively robust or is dependent on just a few points. 
Influence on the correlation coefficient makes the size of the plot symbol represent 
the extent of influence each point exerts on the Pearson correlation coefficient. A scale 
to the right of the plot helps us judge the extent of influence. If any large points appear 
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in the plot, we should scrutinize them before drawing any conclusions concerning the 
correlation. 


The input is: 


USE OURWORLD 
BEGIN 
PLOT MIL*GDP. CAP/INFLUENCE, FILL=1, LOC=4IN, OIN 
PLOT MIL*GDP CAP/XLOG, YLOG, INFLUENCE, FILL-1, LOC=-4IN, QIN 


END 
The output is: 
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In the first plot, notice the extreme influence of Iraq (upper left corner) and of Libya. 
After we transform the data, no influence point (that is, no country) stands out. 


Example 13 
Residuals and Transformations 


Which is the best data transformation? Plotting the standardized residuals of the 
dependent variable can help answer this question. Residual methods include: Linear, 
Quadratic, Log, Power, Mean, Median, Trimmed, Midrange, Andrews, Bisquare, 
Huber, LOWESS, DWLS, Spline, Step, NEXPO, Inverse, Mode, and Kriging. 

We plot MIL against GDP_CAP. SYSTAT computes residuals from a linear fit and 
plots them. We will also add a line where residuals are 0, and draw a convex hull 
around the scatter of points. 
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The input is: 
USE OURWORLD 
BEGIN 
PLOT MIL*GDP CAP/RESID, YLIMIT=0, HULL, XLOG, FILL-1 ,LOC--3IN, OIN, 
TITLE-'Before y transformation' 
PLOT MIL*GDP CAP/RESID, YLIMIT=0, HULL, ХОС, FILL-1,LOC-3IN, OIN, 
TITLE-'After y transformation',YPOW-0,YLAB-' ' 
END 
The output is: 
Before y transformation. After y transformation 
D 
4 
B 
| 
| y 
D 
4 
7 1000 10000 
GOP САР 
Residuals with Spikes 


To represent deviations from a constant value, plot vertical spikes. If you specify no 
value for spike, then lines are drawn from each plot symbol to the horizontal axis. If 
you specify a value, then lines are drawn from each symbol to that level. 


The input is: 


USE OURWORLD 

BEGIN 
PLOT MIL*GDP CAP/RESID, YLIMIT-0, XLOG, SPIKE-0,FILL-1, 

LOC--2.7IN,0IN,TITLE-'Before y transformation' 
PLOT MIL*GDP CAP/RESID, YLIMIT-0, SPIKE=0, XLOG, YPOW-0, FILL-1, 
LOC-2.7IN,0IN TITLE='After y transformation', 
YLAB-' ' 
END 
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The output is: 


Before y transformation After y transformation 
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SPIKE is also available for 3-D displays. 


Smoother through Residuals 


Are the values of the residuals randomly scattered across the range of the independent 
variable? Try adding a LOWESS smooth to the residuals. 


The input is: 


USE OURWORLD 


BEGIN 
PLOT MIL*GDP CAP/RESID YLIMIT-0,SMOOTH-LOWESS, XLOG, FILL=1, 


LOC--2.7IN,0IN,TITLE-'Before y transformation 
PLOT MIL*GDP. CAP/RESID, YLIMIT-0, SMOOTH-LOWESS , ХОС, YPOW, FILL-1, 
LOC-2.7IN,0IN,TITLE-'After y transformation', 
YLAB-' ' 
END 
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Example 14 
Confidence Ellipses 


Here, we examine measurements made on 150 iris flowers: sepal length, sepal width, 

petal length, and petal width (in centimeters.). The data are from Anderson (1935), 
reprinted in Fisher (1936), and are grouped by species: Setosa, Versicolor, and ў 
Virginica (coded as 1, 2, and 3, respectively). We show a confidence ellipse for the 

centroid and for the sample. 


The input is: 
USE IRIS 
BEGIN 
PLOT SEPALLEN*SEPALWID/ELM-. 95,FILL-1,LOC--2. SIN, OIN, 
TITLE='Centroid' 
PLOT SEPALLEN* SEPALWID/ELL, SYMBOL=species, LOC=2. 5IN,OIN, 


"m TITLE-'Sample', LEGEND-NONE,YLAB-' ' 
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The output is: 


SEPALLEN 


Subpopulations in Separate Frames 


Scatterplots 


We separate the data into the three flora species (Setosa, Versicolor, and Virginica) by 
assigning names to the numeric codes in the data file. Then we use SPECIES as the 


grouping variable to plot the three species. 


The input is: 
USE IRIS 


LABEL SPECIES / 1='Setosa' 2='Versicolor' 3='Virginica' 


CATEGORY SPECIES 


PLOT SEPALLEN * SEPALWID / GROUP=SPECIES ELL ROW=2 FILL=1 


CATEGORY 
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The output is: 


Subpopulations within a Single Frame 
We now overlay the species in a single frame. 


The input is: 


USE IRIS 
BEGIN 
PLOT SEPALLEN * SEPALWID / GROUP-SPECIES ELL OVERLAY, 
LOC--2.71N,0IN  LEGEND-NONE, 
COLOR-BLUE, RED, BLACK, 
SYMBOL-4,5,1 FILL-1 
PLOT PETALLEN * PETALWID / GROUP-SPECIES ELL OVERLAY, 
LOC-2.71N,0IN  LEGEND-NONE, 
COLOR-BLUE, RED, BLACK, 
SYMBOL-4,5,1 FILL-1 
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The output is: 
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Example 15 


Kernel Contours 


Bivariate nonparametric kernel density estimators are like continuous histograms and 
show areas where the data are most concentrated in the sample. SYSTAT draws a 2-D 
graph where the confidence kernel for a specified p value is drawn on the х—у plane. 
You can specify a p value between 0 and 1; the default is 0.6827. 


The input is: 


USE IRIS 


BEGIN 
PLOT SEPALLEN*SEPALWID/KERNEL , LEGEND=NONE, FILL=1, LOC=-2 .7 IN, ОТЕ 


PLOT БЕ PALLEN* PETALWID/ KERNEL, LEGEND=NONE, FILL-1,LOC-2 .7IN, OIN 


END 
The output is: 
8 ы 57 +т— 7 
i 
B. ok A 
А Ж, ; s- 4 
2 К; x z 
a - dp War 1 
de QV . zh ~ 
$ е. a 
а ad xut)» 2F s. 4 
stot oe ‘iit 
NEC, ibt 4 
ae 
^ : s 3 * oem à 


Overlaying an Ellipse and Kernel Estimator 


Using a BEGIN...END command statement, we draw Gaussian bivariate ellipses on top 
of our kernel contour. The normal theory estimator can differ dramatically from the 


nonparametric estimator. 


The input is: 


USE IRIS 
BEGIN 


PLOT SEPALLEN*SEPALWID/KERNEL, GROUP=SPECIES , ROW=1, LEGEND=NONE 


PLOT SEPALLEN* SEPALWID/ELL, GROUP=SPECIES, ROW=1, SCALE=NONE, 


END 
BEGIN 


XLABEL-' ',YLABEL-' ',LEGEND=NONE, FILL=1 


PLOT PETALLEN* PETALWID/KERNEL , GROUP=SPECIES , ROW=1, LEGEND=NONE 
PLOT PETALLEN* PETALWID/ELL,GROUP=SPECIES, ROW-1, SCALE=NONE, 


END 


The output is: 
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Example 16 
LOWESS and DWLS 


If you want to do a regression of one variable on another but are not positive about the 
shape of the function, use DWLS (or LOWESS) smoothing first. 

Linear, quadratic, and other parametric smoothers presuppose the shape of the 
function. LOWESS (Cleveland, 1979, 1981) requires only that the smooth be a function 
(that it have a unique y value for every x). It produces a smooth by running along the x 
values and finding predicted values from a weighted average of nearby y values. 

DWLS (distance weighted least-squares) smoothing fits a line through a set of 
points by least-squares. Unlike linear or low-order polynomial smoothing, however, 
the surface is allowed to flex locally to fit the data better. The amount of flex is 
controlled by the Tension parameter. This method produces a true, locally weighted 
curve running through the points using an algorithm attributed to McLain (1974). 

Because a lot of computations are involved, both methods can be time consuming 
for large scatterplots. For example, for DWLS, every point on the smoothed line 
requires a weighted quadratic multiple regression on all the points. 

Using the OURWORLD data, here are both the smoothers of the number of 
McDonald’s restaurants against the proportion of the population living in cities. We 
use SYMBOL to label each point with the first letter of the country name and YLOG to 
plot the values on the y axis in log units. 


The input is: 


USE OURWORLD 


BEGIN 
PLOT MCDONALD * URBAN / YLOG SYMBOL=COUNTRY$ SHORT TICK=INDENT, 


SMOOTH-LOWESS LOC=-3IN,0IN, 
TITLE-'LOWESS' 

PLOT MCDONALD * URBAN / YLOG SYMBOL-COUNTRYS SHORT TICK-INDENT, 
YLAB-' ' SMOOTH-DWLS LOC-2.5IN,OIN, 
TITLE-'DWLS' 


END 
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The output is: 
LOWESS DWLS 


T a a ттт тт 


10 20 30 40 50 60 70 80 90 100 1510 20 30 40 50 60 70 80 90100 
URBAN URBAN 


Notice the upward trend, indicating that the more urban countries have more 
McDonald's restaurants. 


Tension 


You can set a stiffness parameter for LOWESS or DWLS (also inverse, mode, median, 
mean, trimmed, spline, and kriging) smoothing by specifying a value for tension. 

The default tension value for LOWESS smoothing is 0.5, meaning that half the 
points are included in each running window. If you increase this value, the curve will 
be stiffer; if you decrease it, the curve will be looser, following local irregularities (use 
a value between 0 and 1). Here, we add tension values of 0.2 and 0.9 to the DWLS 
smoothing. 


The input is: 


BEGIN 
PLOT MCDONALD * URBAN / YLOG SYMBOL=COUNTRY$ SHORT TICK=INDENT, 
SMOOTH-DWLS LOC=-3IN,0IN, 
TITLE='Tension=0.2' TENSION=.2 
PLOT MCDONALD * URBAN / YLOG SYMBOL=COUNTRY$ SHORT TICK=INDENT, 
YLAB=' ' SMOOTH-DWLS LOC=2.5IN,0IN, 
TITLE='Tension=0.9' TENSION=.9 
END 
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The output is: 


1 
10 20 30 40 50 60 70 80 90100 116 20 30 40 50 60 70 80 90100 
URBAN URBAN 


In the right plot, tension is set at 0.9, meaning that 90% of the data are used to smooth 
each value on the curve. In the left plot, tension is 0.2. 


Example 17 
Linear, Quadratic, Log, and Power Smoothing 


Here we employ linear, quadratic, log, and power smoothing for the OURWORLD 
data. 


The input is: 


USE OURWORLD 
BEGIN 
PLOT MCDONALD * URBAN / SMOOTH-LINEAR YLOG FILL=1, 
TITLE-'Linear' LOC--3IN,3IN 
PLOT MCDONALD * URBAN / SMOOTH-QUADRATIC YLOG FILL=1, 
TITLE-'Quadratic' LOC-3IN,3IN 
PLOT MCDONALD * URBAN / SMOOTH-LOG YLOG FILL=1, 
TITLE-'Log' LOC--3IN, -3IN 
PLOT MCDONALD * URBAN / SMOOTH-POWER YLOG FILL=1, 
TITLE-'Power' LOC-3IN,-3IN 


END 
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The output is: 


зз gt 
10 20 30 40 50 80 70 80 90 100 10 20 30 40 50 60 70 80 90 100 
URBAN. URBAN 


In order to emphasize the shapes of these smoothers, we did not limit the smoother 
domain to the data range. Notice that the quadratic smoother is concave upward, while 
the curves for log and power are the reverse. 


Example 18 
Spline, Step, NEXPO, and Inverse Smoothing 


Cubic splines are especially useful for interpolation when you need a computer French 
curve. This means that you should use splines through the data points only when you 
believe your data contain no error. Otherwise, you should choose one of the regression 
methods (Linear, Quadratic, DWLS, or LOWESS). The tightness of the spline curve is 
controlled by a tension parameter, which determines how tightly the curves are pulled 
between the knots. Normally, the parameter is 2, but you can set it down to 0 to make 
it looser and up to 10 to make it tighter. Try several values on the same data to see what 
we mean. 
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In this example, we employ four smoothers useful for interpolation. 


The input is: 


USE OURWORLD 
BEGIN 
PLOT MCDONALD * URBAN / SMOOTH-SPLINE YLOG FILL-1, 

'Spline' LOC--3IN,3IN 

PLOT MCDONALD * URBAN / SMOOTH-STEP YLOG FILL-1, 
TITL Step' LOC-3IN,3IN 

PLOT MCDONALD * URBAN / SMOOTH-NEXPO YLOG FILL-1, 
TITLE-'NEXPO' LOC--3IN,-3IN 

PLOT MCDONALD * URBAN / SMOOTH-INVERSE YLOG FILL-1, 
TITLE-'Inverse' LOC-3IN,-3IN 


END 


The output is: 


1 
"0 20 30 40 50 60 70 80 90 100 40 20 30 40 50 60 70 80 90 100 
URBAN URBAN 


The Spline plot shows a spline smooth through the data points. Circle plot symbols are 
used to show the point locations more clearly. In many circumstances, however, you 
might want to use no symbols and leave only a curve. 
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If you want to drop (or rise) immediately from the first point to the second in a Step 
plot, you need to shift the y data values one lag down the list for each x in your data. 


Inverse squared distance smoothing (for 3-D displays). Inverse smoothing is similar to 
NEXPO smoothing except that no regression estimation is used. The z height of the 
curve at a smoothing point is the weighted average of the y values at x values of the 
data points, where the weights are the squared Euclidean distances across x and y. 


NEXPO for 3-D Interpolation 


You can use NEXPO to interpolate a smooth surface through points in 3-D. Negative 
exponential weights are computed from the distances between points in a regular grid 
and the irregularly spaced data points in the x—y plane. These weights are used in a 
quadratic function to compute the height of the surface at each grid point. 

Akima (1978) derives a similar method for 3-D interpolation; however, negative 
exponential smoothing outperforms Akima's method on his own data. Akima samples 
the following values from a more detailed surface to see how well his spline 
interpolation method would do in recovering the original surface. 


The input is: 


USE AKIMA 

PLOT Z * Y * X / SMOOTH=NEXPO SYMBOL-1 SIZE=0, 
CUT-30 XREV YREV XMIN=0, 
XMAX=25 YMIN=0 YMAX=20 


The output is: 
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Example 19 
Smoothing (Mean, Median, Mode and Midrange Smoothers) 


The following plots illustrate moving averages, running means, a modal smooth, and 
a midrange smooth. 

One of the simplest smoothers is the Mean (or moving average) smoother. If a data 
point consists of a smooth component plus random error, and we average several points 
around a single point, then the errors should tend to cancel each other. Median (or 
running median) works like Mean, except values are replaced by the median of the 
window (the surrounding points) instead of the mean. The Mode (or running mode) 
plots the mode of the surrounding points (Scott, 1992). 

Unlike these three smoothers that move “windows” across the range of the x 
variable, Midrange simply draws a line between the points marking the first and third 
quartiles. 


The input is: 


USE OURWORLD 
BEGIN 
PLOT MCDONALD * URBAN / SMOOTH-MEAN TICK-INDENT SHORT YLOG, 
TITLE-'Mean' FILL-1  LOC--3IN,3IN 
PLOT MCDONALD * URBAN / SMOOTH-MEDIAN TICK-INDENT SHORT YLOG, 
TITLE-'Median' FILL-1  LOC-3IN,3IN 
PLOT MCDONALD * URBAN / SMOOTH-MODE TICK-INDENT SHORT YLOG, 
TITLE-'Mode' FILL-1  LOC--3IN, -3IN 
PLOT MCDONALD * URBAN / SMOOTH-MIDRANGE TICK-INDENT SHORT YLOG, 
TITLE-'Midrange' FILL-1  LOC-3IN,-3IN 


END 
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Mean Median. 
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URBAN URBAN 
Example 20 
Modal Smoothing 


The Mode smoother highlights the presence of subpopulations. We plot LITERACY 
against B. TO. D (the ratio of birth rate to death rate for each country). 


The input is: 


USE OURWORLD 
BEGIN 
PLOT LITERACY * B_TO_D / SMOOTH=MODE FILL=1, 
LOC=-3IN, OIN 
PLOT LITERACY * B TO D / SMOOTH-MODE LABEL=COUNTRYS, 
FILL-1 CSIZE-1.3 LOC=3IN, 0IN 
END 
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The output is: 
1101—T1—r7—7T—1—1—1—1 SSS Tt 
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B TO D втор 


The smoother identifies three swarms of points іп the plot. We print the country names 
and identify European countries concentrating in the top left corner, New World 
countries in the middle, and Islamic countries at the bottom. 

What happens when we request separate frames for BIRTH. RT and DEATH RT 
(instead of their ratio) against gross domestic product?. 


The input is: 
PLOT BIRTH RT DEATH RT * GDP CAP / SMOOTH-MODE FILL-1, 
XLOG SYMBOL-'b','d' 


The output is: 


ww 
СОР.САР 


The smoother again forms three distinct curves: Islamic countries predominantly 
toward the left, New World toward the middle, and European toward the right. 
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Example 21 
Robust Smoothers 


The Andrews, Bisquare, Huber, and Trimmed smoothers use psi functions to 
downweight the influence of cases with extreme residuals on the estimates of a and b 
in the regression y = a+ bx. 

There is little visual difference among the Andrews, Bisquare, and Huber 
smoothers. Here, we also display the usual line of best fit (linear) in each plot. 


The input is: 


USE OURWORLD 
BEGIN 
PLOT MCDONALD * URBAN / SMOOTH-ANDREWS XMIN-20 YMIN-0 TICK-INDENT, 


FILL: LOC--3IN,3IN, 
TITLE-'Andrews' 
PLOT MCDONALD * URBAN / SMOOTH-LINEAR XMIN-20  YMIN-0 TICK-INDENT, 


LINE DASH=6 XLAB-' ' YLAB=' ', 
FILL-1  LOC--3IN,3IN, 
COLOR-BLACK SIZE=0 
PLOT MCDONALD * URBAN / SMOOTH-BISQUARE XMIN-20 YMIN-0 TICK-INDENT, 
FILL-1  LOC-3IN,3IN, 
TITLE-'Bisquare' 
PLOT MCDONALD * URBAN / SMOOTH=LINEAR XMIN-20 YMIN=0 RICK INDENT, 
LINE DASH=6 XLAB=' ' 
FILL-1  LOC-3IN,3IN, 
COLOR-BLACK SIZE=0 
PLOT MCDONALD * URBAN / SMOOTH-HUBER XMIN-20 YMIN=0 TICK-INDENT, 
FILL-1  LOC--3IN,-3IN, 
TITLE-'Huber' 
PLOT MCDONALD * URBAN / SMOOTH-LINEAR XMIN-20 YMIN=0 TICK-INDENT, 
LINE DASH=6 XLAB-'' YLAB=' ', 
FILL-1  LOC--3IN,-3IN, 
COLOR-BLACK SIZE-0 
PLOT MCDONALD * URBAN / SMOOTH-TRIMMED XMIN-20 YMIN=0 TICK=INDENT, 
FILL-1  LOC-3IN,-3IN, 
TITLE-'Trimmed' 
PLOT MCDONALD * URBAN / SMOOTH-LINEAR XMIN-20 YMIN=0 TICK-INDENT, 
LINE DASH=6 XLAB-' ' YLAB=' ', 
FILL-1  LOC-3IN,-3IN, 
COLOR-BLACK SIZE-0 
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Example 22 
Contour Plots 


Scatterplots 


The Scatterplot procedure allows you to produce 2-D contour plots with three variables 
(x variable, y variable, and z variable). Use Display as 2D Contour to display the z 


variable as a third variable on a 2-D scatterplot. 


SYSTAT first computes its own square grid of interpolated or directly estimated 
values. From this grid, contours are followed using the method of Lodwick and Whittle 
(1970) combined with linear interpolation. This method is guaranteed to find proper 
contours if the grid is fine enough. The standard grid is 30 by 30. To increase this 


resolution, use Cut to add up to 100 grid cuts. For rough contours, 
computing time by setting cuts to below 30. 


you can reduce 


SYSTAT automatically determines the number of contours to draw so that the 
surface is delineated and the contour labels are round numbers. If you want to modify 
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this number, determine the number of tick marks on the third axis and the number of 
contours to be drawn. 

We can assign symbols to view the groups of countries: E for European, N for New 
World, and / for Islamic. 


The input is: 


USE OURWORLD 

PLOT LITERACY*DEATH RT*BIRTH RT/CONTOUR, ALT-2.5IN,HEI-2.5IN, 
WIDTH-3.5IN,ZTICK-8,ZMIN-0, 
ZMAX-100,SYMBOL-GROUP$,COLOR-BLACK 


The output is: 


Notice that the European countries fall within the 88 to 100% literacy rate and have 
low birth and death rates. 


Example 23 
High-Low-Close Plots 


Stock market statistics, whether daily, weekly, or monthly, are often most effectively 
plotted as a set of ranges between high and low prices with a market closing price at 
each period. This is the way most newspapers plot the market. 

In the first plot, we use the numeric variable MONTH as the x variable, which 
determines the points on the x axis at which the high, low, and close values will be 
plotted. In the second plot, we use the categorical variable MONTHS, and we use 
ORDER to make sure that the months are not sorted alphabetically. 
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The input is: 


USE HILO 
BEGIN 
PLOT CLOSE HIGH LOW * MONTH / HILO XMIN-0 XMAX-13 FILL-1, 
XTICK-13 YLABEL-'Stock Price', 
LOC--3IN,OIN 
ORDER MONTH$ /SORT-NONE 
PLOT CLOSE HIGH LOW * MONTH$ / HILO FILL-1 LOC-3IN,OIN, 
YLAB-' ' 


END 


The output is: 


саак бє SSSR P 
MONTHS 


Example 24 
Voronoi Tessellation 


The Voronoi tessellation is also known as the Dirichlet tessellation or the Thiessen 
diagram. Imagine placing little balls of hot roll dough irregularly spaced in a baking 
pan. After letting them rise, you notice that the boundaries between rolls are straight 
and approximately half way between the points where you placed the balls. The same 
thing happens with colonies of yeast irregularly spaced on a Petri dish, grass fires 
started at different points on a plain, or cats establishing their turf in the city. These 
boundaries appear in diverse physical phenomena: Geography (Rhynsburger, 1973), 
Hydrology (Croley and Hartmann, 1985), Ecology (Ripley, 1981), Crystallography 
(Gilbert, 1962), Physics (Miles, 1974), Psychology (Coombs, 1964), and others. 
Voronoi tessellation presumes that you have equivalent distances on the x and у 
axes. If your scales differ, resize the plot to make its dimensions reflect the true scale 
values. For example, if your vertical scale runs from 0 to 50 and the horizontal scale 
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runs from 0 to 100, make the physical width twice the height. Voronoi tessellation 
takes a long time to compute the order of the square of the number of points. 

We use the EURONEW file, which contains 27 European countries and the 
longitude, LABLON, and latitude, LABLAT, of their capitals. 


The input is: 


USE EURONEW 
BEGIN 
MAP / PROJECT-MERCATOR XLAB='' ME: 
AXES-BOX  SCALE-NONE XMIN=-15 XMAX=35, 
YMIN-30 ҮМАХ=60  COLOR-BLACK 
PLOT LABLAT * LABLON / PROJECT-MERCATOR  VORONOI, 
DASH-7  AXES-NONE SCALE=NONE, 
XMIN--15 XMAX-35 YMIN=30 YMAX=60, 


SYMBOL-1 SIZE=.5 FILL 
END 


The output is: 


Example 25 
Delaunay Triangulation 


Delaunay triangulation partitions the non-triangular polygons of a Voronoi tessellation 
into triangles. It creates these triangular paths by joining the vertices of the Voronoi 
polygons (Okabe, Boots & Sugihara, 1992.) 
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The input is: 


USE EURONEW 
BEGIN 
MAP / PROJECT-MERCATOR XLAB='' YLAB='', 
AXES-BOX SCALE=NONE XMIN=-15 XMAX=35, 
YMIN=30 YMAX=60 COLOR=BLACK 
PLOT LABLAT * LABLON / PROJECT-MERCATOR DELAUNAY, 
DASH=7  AXES-NONE SCALE=NONE, 
XMIN--15 XMAX=35 YMIN=30 YMAX=60, 
SYMBOL=1 SIZE=.5 FILL 
END 


The output is: 


Example 26 
Minimum Spanning Trees 


A minimum spanning tree connects a set of points in a space in which the sum of the 
lengths of the connecting line segments is as small as possible (Hartigan, 1975). 

Imagine, for example, that you have a map of the world and must connect one city 
in each country with a computer network. The network may have any shape, provided 
there is only one path along the network from one city to any other. You wish to spend 
as little as possible on optical cable for the network. The solution to your problem is a 
minimum spanning tree. 

A minimum spanning tree presumes you have equivalent distances on the x and y 
axes. If your scales differ, adjust the size of the graph to make the physical dimensions 
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of your plot reflect the true scale values. For example, if your vertical scale runs from 
0 to 50 and the horizontal scale runs from 0 to 100, make the width twice the height. 

You can use a minimum spanning tree on a Voronoi tessellation. Each span will be 
perpendicular to the edge of a polygon delimiting the closest points. 


The input is: 


MAP / PROJECT=MERCATOR XLAB='' YLAB='', 
AXES=BOX SCALE=NONE XMIN=-15 XMAX=35, 
YMIN=30 YMAX=60 COLOR=BLACK 
PLOT LABLAT * LABLON / PROJECT=MERCATOR SPAN, 
DASH-7  AXES-NONE SCALE=NONE, 
XMIN--15 XMAX=35 YMIN=30 YMAX=60, 
SYMBOL=1 SIZE=.5 FILL 
END 


The output is: 


Example 27 
Traveling Salesman Algorithm 


The traveling salesman path plots the traveling salesman algorithm’s solution for the 
shortest path connecting the points in a scatterplot. The algorithm tries to find the 
shortest route for a salesman to visit a set of locations and return home. Thus, it tries 
to find the shortest possible closed path that connects the points. Repetitions—two 
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visits to the same city—are not allowed. The algorithm does not necessarily find the 
best solution, but it will find a good approximation. 

Suppose that you have a map of Europe and you wish to travel to all the European 
countries. For this you need to have an optimal route so that the traveling time and cost 
are reduced, along with the constraint that you would be visiting any country once and 
only once. The traveling salesman algorithm provides the solution for your problem. 

The traveling salesman algorithm goes through a maximum of 100 iterations, 
displaying the path length and the number of moves for each iteration. Then, 
SYSTAT's approximation for the best path is plotted. The traveling salesman path is 
adapted from Press, Flannery, Teukolsky, and Vetterling (1992). 


The traveling salesman path can take a long time to calculate. This option is not 
recommended for data files with a large number of cases. 


The input is: 


USE EURONEW 
BEGIN 
MAP / PROJECT-MERCATOR, XLAB='', YLAB-'' , AXES=BOX, SCALE-NONE, 
XMIN--15 XMAX=35, YMIN=30, YMAX-60, COLOR-BLACK 
PLOT LABLAT*LABLON/PROJECT-MERCATOR, TSP, DASH-7,AXES-NONE, 
SCALE-NONE, XMIN--15, XMAX-35, YMIN-30, YMAX-60, 
SYMBOL-1,SIZE-0.5,FILL,LABEL-COUNTRYS 


END 


The output is: 
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Example 28 
Vector Plots 


Vector lines to x (y or z) work like vertical spikes to y, except that each point is 
connected to one point rather than to an axis or plane. This type of plot is especially 
useful for representing factor loadings and other vector models. 

We use the OURWORLD data, compute principal components (PCA) with the 
Factor procedure, and save the loadings. We then transpose the loadings file and plot 
the first two components. 


The input is: 


FACTOR 


USE OURWORLD 

LET(GDP CAP,POP 1983,POP 1986) = L10(8) 

LET (POP 1990,POP 2020) = L10(@) 

LET(MIL,EDUC) = SQR(@) 

SAVE MYLOAD / LOAD 

MODEL URBAN BIRTH RT DEATH RT GDP CAP, 
MIL EDUC B TO D LITERACY POP 1983, 
POP 1986 POP 1990 РОР 2020 

ESTIMATE / METHOD-PCA  ROTATE-OBLIMIN 


USE MYLOAD 
TRANSPOSE 
PLOT COL(2) * COL(1)/ VECTOR-0 LABEL=LABEL$ TICK=INDENT 


Here are the transposed MYLOAD data: 


LABELS COL(1) COLQ) COL(3) 
POP 1983 0.090 0.990 0.032 
POP. 1986 0.057 0.994 0.030 
РОР 1990 0.028 0.997 0.016 
POP. 2020 -0.197 0.982 -0.056 
URBAN 0.847 0.156 -0.221 


BIRTH RT —0.937 0.014 0.016 
DEATH RT —0.462 0.007 0.862 


GDP_CAP 0.971 -0.051 0.058 
EDUC 0.875 -0.125 0.191 
MIL 0.769 0.100 0.150 
LITERACY 0.899  -0.068 — -0.256 


B TO D -0.486 | -0.046 -0.802 
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The output is: 


1 T T T 


1.5 


-2 -1 0 1 
COL(1) 


Vector plot with variables 


You can also use variables as vector option. 


The input is: 


USE AFIFI 
PLOT SYSINCR*DRUG*DISEASE/VECTOR- SYSINCR, DRUG, DISEASE 


The output is: 


If the variable sequence specified after the vector option is the same as the x , y and 
z variables specified for the plot, then the graph will have no vector lines. 
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The input is: 


USE AFIFI 
PLOT SYSINCR*DRUG*DISEASE/VECTOR- DISEASE, DRUG, SYSINCR 


The output is: 


Example 29 
Sunflower Displays 


Sunflower symbols show overlapping data. Sometimes we have data that overlap at 
exactly the same values. For example, we may have a questionnaire with a 7-point 
scale and want to plot two items against each other. There are only 49 possible points 
where we may plot the data. Similarly, we may have aggregate data and want to plot 
them in a scatterplot. 

To represent the overlap in these data, you can plot them with special symbols that 
are lighter for small values and darker for larger values of a COUNT variable. Most of 
them look like flowers, so this is often called a sunflower plot. Only nine symbols are 
possible, so larger counts are plotted with the darkest, largest symbol (a filled circle). 

If you have a count variable, such as in this example, you need to weight the cases 
before using sunflower symbols. Otherwise, if your data are not aggregated but 
nevertheless have duplicate pairs of values, SYSTAT computes the duplicates. 


The input is: 


USE USCOUNT 
FREQ COUNT 
PLOT PERSON * PROPERTY / FLOWER SIZE=2 
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100 


90r 
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PERSON 
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ze 
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Three-Dimensional Line Plots 


Emm 1i 


Scatterplots 


You can plot line graphs in three dimensions. The following data (in the file SPIRAL) 


produce a spiral in three dimensions: 


X 


0.130 
0.087 
0.028 
-0.109 
-0.288 
-0.448 
-0.532 
-0.460 
-0.245 
0.093 
0.481 
0.818 
1.000 
0.953 
0.654 
0.147 


Y 


0.036 
0.127 
0.229 
0.288 
0.255 
0.110 
-0.129 
-0.408 
—0.648 
-0.764 
—0.696 
-0.428 
0.001 
0.502 
0.951 
1.222 


Z 
0.923 
0.846 
0.769 
0.692 
0.615 
0.538 
0.461 
0.385 
0.308 
0.231 
0.154 
0.077 
0.000 

-0.077 
-0.154 
-0.231 


R 
0.270 
0.970 
1.449 
1.933 
2.417 
2.901 
3.380 
3.868 
4.351 
4.833 
5.317 
5.801 
0.001 
0.485 
0.968 
1.451 


THETA 
0.135 
0.154 
0.231 
0.308 
0.385 
0.461 
0.547 
0.615 
0.693 
0.770 
0.846 
0.923 

1.000 
1.077 
1.154 
1.231 
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—0,466 1.222 -0.308 1.936 1.308 
-1.038 0.917 -0.385 2.418 1.385 
-1.420 0.347 —0.462 2.902 1.462 
-1.493 -0.371 -0.539 3.386 1.538 
-1.207 -1.074 -0.616 3.869 1.616 
—0.597 -1.584 —0.693 4.352 4.352 
0.217 -1.756 -0.770 4.835 1.769 
1.052 -1.517 —0.846 5.319 1.846 
1.705 —0.890 —0.923 5.802 1.923 
2.000 0.005 —1.000 0.002 2.000 
1.837 0.970 —1.077 0.486 2.077 
1.219 1.776 -]1.154 0.969 2.154 
0.263 2.216 -1.231 1.453 2.232 
—0.824 2.156 —1.308 1.936 2.308 


If you do not want to see the symbols that the lines connect, make them invisible. 
The input is: 
USE SPIRAL 


PLOT Z * Y * X / LINE SIZE=0 


Here is a plot of these data connected with a line. We generated these values using 
SYSTAT. If you want a nicer curve, you can generate more points. 


The output is: 
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Spirals and other geometric figures appear frequently in the sciences. Krumhansl and 
Shepard (1979) demonstrated that the perception of musical pitch is organized in a 
three-dimensional spiral like the one here. 


Example 31 
Polar Coordinates 


Ordinarily, you draw plots in rectangular (Cartesian) coordinates. You can also plot in 
polar or triangular coordinates. 

SYSTAT automatically scales polar graphs as well as rectangular graphs. The units 
printed on the scales are those of the data. You may alter these scales with Axes. Use 
the y axis for the r axis (distance) and the x axis for the Ө axis (angle). The minimum 
value on the y axis is always at the center of the circle and the maximum at its 
periphery. The minimum and maximum for the x axis always coincide at the right edge 
(0 radians). 

Allother options work in polar coordinates. Use them intelligently, however, or you 
will get bizarre graphs. 

The following data, POLAR, show the highest frequency (in thousands of cycles per 
second) perceived by a subject listening to a constant amplitude sine wave generator 
oriented at various angles relative to the subject. Zero degrees corresponds to straight 


ahead of the subject. 
ANGLE FREQ ANGLE FREQ 
0.0 12.1 200.0 12.0 
20.0 12.4 220.0 12.3 
40.0 12.4 240.0 12.8 
60.0 12.6 260.0 14.1 
80.0 12.9 280.0 14.3 
100.0 12.8 300.0 13.8 
120.0 12.7 320.0 13.4 
140.0 12.4 340.0 12.6 
160.0 12.1 360.0 12.1 
180.0 11.9 
The input is: 
USE POLAR 


PLOT FREQ * ANGLE / POLAR SMOOTH=SPLINE FILL=1, 
SYMBOL-1 XMIN=0 XMAX=360, 
YMIN-11 ҮМАХ=15 
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Here is the polar plot of these data with the points connected by a spline smooth. 


The output is: 


REQ 
180 FR 0 


319NV 


270 


Notice that the profile shows hearing impairment in the left ear relative to the right. 


Example 32 
Cylindrical Coordinates 


You can draw plots in cylindrical coordinates. In three dimensions, SYSTAT plots each 
point as [6, ғ, z], where r is the distance from the point to the origin, Ө is the angle 
between the positive x axis and the vector from the origin to the point, and z is the 
vertical distance from the point to the r0 plane. Polar coordinates for 3-D plots are 
actually cylindrical coordinates. 

In the three-dimensional line plot example, we plot a spiral in three dimensions. 
The variables R and THETA are the polar coordinate equivalents to x and y. 

Here, we use polar coordinates to plot the spiral. 


The input is: 


USE SPIRAL 
PLOT Z * К * THETA / POLAR LINE XMIN=0, 
XMAX-6.28 XTICK-4 SIZE=0 
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The output is: 


Example 33 
Triangular Coordinates 


You can plot three variables in two dimensions with triangular coordinates. Triangular 
coordinate plots are usually done on mixture data. The following data, COLOR, 
represent five color hues as a mixture of RED, GREEN, and BLUE. 


COLORS RED GREEN BLUE 
Canary 0.47 0.47 0.06 
Orange 0.70 0.26 0.04 
Brown 0.40 0.40 0.20 
White 0.33 0.33 0.33 
Violet 0.49 0.02 0.49 


Here is a triangular plot of these data with each point labeled by its corresponding 
color. 


The input is: 
USE COLOR 
PLOT RED * GREEN * BLUE / TRIANGLE SYMBOL-COLOR$, 
XMIN-0 ХМАХ=1 ҮМІМ=0, 
YMAX-1 ZMIN-0 2МАХ=1 
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The output is: 


Notice that White is an even mixture of the three colors and is thus located in the center. 
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Chapter 


Multivariate Displays 


Leland Wilkinson 


SYSTAT includes several displays useful for describing values of three or more 
variables: Andrews's Fourier plots, parallel coordinate displays, scatterplot matrices 
(SPLOM’s), icon plots, and multiplots. 

An Andrews's Fourier plot constructs a wave form (for each case) made up of 
sine and cosine components—using a different sine or cosine component for each 
variable you include. Cases with similar values have wave forms with similar shapes, 
making it easy to recognize distinct subgroups of cases. 

A parallel coordinate display draws an "axis" for each variable and positions 
them side by side so that they are parallel. The same scale is used for each axis. The 
values for a case are plotted on the axes and connected with a line segment. The 
patterns of the connecting lines vary by subgroup. 

A scatterplot matrix (SPLOM) positions a bivariate scatterplot for each specified 
pair of variables in a row and column display. A common vertical scale is used for all 
plots within a row, and a common horizontal scale is used within a column. Thus, the 
display is a matrix of scatterplots, each one corresponding to an entry in a correlation 
matrix for the variables. 

An icon plot represents the values of multiple variables as cartoon faces, Fourier 
blobs, stars, histograms, and other shapes. For some data, you will be able to identify 
groups of icons (cases) that elude automated clustering methods because your eye can 
perceive nonlinear, disjunctive relationships. 

А multiplot organizes two-dimensional graphics as a table. Plots appearing in the 
table are stratified by one or two grouping variables. The levels of the grouping 
variables appear at the top and left of the table. The variables depicted in each plot 
appear at the bottom and right of the table. 
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Sample Displays 
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Andrews's Fourier Plot Dialog Box 


А particularly powerful method for identifying clusters of cases in multivariate data is 
to plot their Fourier components. Andrews (1972) developed these plots. Fourier 
functions have the following form: 


fto «= y + yasin(t) + y;cos(t) + y,sin(2t) + yscos(21)... 


where y is a p-dimensional variate and t varies from —3.14 to 3.14 (1/4 radians on either 
side of 0). The result of this transformation is a set of wave forms made up of sine and 
cosine components for each selected variable. Each wave form corresponds to one case 
in the data file. Cases that have similar values across all variables will have 
overlapping wave forms in the plot. Cases with different patterns of variation will have 
contrasting wave forms. 


To open the Andrews's Fourier Plot dialog box, from the menus choose: 


Graph 
Multivariate Displays 
Andrews's Fourier Plot... 
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LAE CaseLabels _ | _ Line Sye 
Main xs | Ys | Айхеє | Lau | Legend 
Available variable(s} Waveform variables: 
COUNTRYS Ж «Required» 
POP. 1983 vum 
POP. 1986 
POP. 1990 
POP. 2020 
URBAN 
BIRTH. 82 
BIRTH. RT 
DEATH 82 


[C] Overlay multiple graphs into a single frame 


Waveform variables. Variables to be plotted. You must select at least two variables. 
Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames, or overlaid in a single 
display. 

Overlay multiple graphs into а single frame. You can display multiple graphs in the 
same frame rather than side by side. Different line patterns or colors distinguish 
separate plots or subpopulations. 


You can customize the axes, layout, and appearance of the chart. 
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Standardizing the Data 


When using Andrews's Fourier plots, parallel coordinate displays, icon plots, all 
variables should be measured on the same scale; use Standardize on the Data menu to 
transform the variables if necessary. Otherwise, variables with large values can 
dominate the display. To place the values of each variable on a 0,1 scale, select 
Standardize and then Range. For a mean 0 and standard deviation 1 scale, use the SD 
option of Standardize. 


Parallel Coordinate Display Dialog Box 


Cartesian coordinates are computed on perpendicular axes. This works fine for two- 
and three-dimensional plots. Higher dimensions, however, would be difficult to 
visualize. Inselberg (1985) proposed making coordinate axes parallel for these higher 
dimensional plots. See Wegman (1986) and Curtis, Burton, and Campbell (1987) for 
further information. 


To open the Parallel Coordinate Display dialog box, from the menus choose: 


Graph 
Multivariate Displays 
Parallel Coordinate Display... 
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Parallel Coordinate Display 


[| Cd ee — | 
Man | хаш | Yes | Аке | 


Available vatiable(s]: Parallel variables: 
COUNTRY$ J «Required» 
POP 1983 
POP 1986 5 
POP. 1990 - mex 
POP_2020 ~ Remove 
URBAN 
BIRTH_82 
BIRTH. RT 
DEATH 82 - 
DEATH, RT Add -> 
RARYMTA? 


mE m | |< Remove 


Overlay multiple graphs into a single frame 


A parallel coordinate display draws an axis for each variable and positions them side 
by side so that they are parallel. The same scale is used for each axis. The values for a 
case are plotted on the axes and connected with a line segment. 


Parallel variables. Variables to be plotted. You must select at least two variables. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames, or overlaid in a single 
display. 

Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side by side. Different line patterns or colors distinguish 
separate plots or subpopulations. 
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You can customize the axes, layout, and appearance of the chart. АП variables should 
be measured on the same scale; standardize the data if necessary. 

The procedure for Parallel is similar to that for Fourier. Indeed, parallel and 
Fourier coordinates are alternative representations of the same data. The advantage of 
Fourier coordinates is that variables are reduced to a smaller number of features in the 
plot. The advantage of parallel coordinates is that the plot shows the variables 
themselves, which facilitates interpretation. Compare Fourier to icon blobs and 
parallel coordinate displays to icon stars. Blob and star plots are polar coordinate 
versions of Fourier and parallel coordinate plots, respectively. 
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Case Labels 


In Andrew's Fourier Plot and Parallel Coordinate Display, if you want to label the 
cases, you can choose Case Labels tab. 


E Display case labels as variable: 
[EDUNTRVE 502 Tv 


Label size 1 


Display case labels as variable. Select/deselect this option to display the values of the 
selected variable as case labels in the plot. 


Label size. Displays the label with the specified size. 
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Scatterplot Matrix Dialog Box 


Scatterplot Matrix plots all possible combinations of two or more numeric variables 
against one another. 


To open the Scatterplot Matrix dialog box, from the menus choose: 


Graph 
Scatterplot Matrix (SPLOM)... 


Graph Scatterplot Matrix (SPLOM) 


L m 1 ie and Label Line Style 
Мап Options | _ Smoother Yåxis | M Legend 


Available variable(s]: Row variable(s): 


COUNTRYS — I «Required» 
POP. 1983 

POP. 1986 

POP. 1990 = 

POP. 2020 
URBAN 
BIRTH 82 
BIRTH. RT 
DEATH 82 
DEATH. RT 
ВАВҮМТӨ2 


Density displays in diagonal cells: | Histogram и 
[Г] Specify separate row and column variables — [7] Transpose matrix 
[Г] Only display bottom half of matrix and diagonal 

Overlay multiple graphs into а single frame 


The plots are arranged in rows and columns, with the same number of rows and 
columns as there are variables. The point of the plot is simple. When you have many 
variables to plot against each other in scatterplots, it is logical to arra nge the plots in 
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rows and columns using a common vertical scale for all plots within a row (and a 
common horizontal scale within columns). All complete x-y pairs within each plot are 
used; that is pairwise deletion is used for missing data. 

A scatterplot matrix is often called a SPLOM (Scatterplot Matrix). It is also called 
a “casement plot" (Cleveland, 1994; Chambers, Cleveland, Kleiner, and Tukey, 1983). 
Although this graph has been rediscovered several times, the first published reference 
is in Hartigan (1975), where it is described as a pairwise plot. 


Row and Column variable(s). Variable(s) to be plotted in the matrix. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames or overlaid in a single 
display. 

Density displays in diagonal cells. Inserts density diagrams in the diagonal cells of a 
SPLOM. Specify one of the following: histogram, gap histogram, frequency polygon, 
fuzzygram, box plot, notched box plot, box and symmetrical dot, botch box and 
symmetrical dot, dot histogram (Dit), symmetrical dot density, jittered dot density, 
density stripes, normal curve, kernel curve, cumulative histogram, cumulative gap 
histogram, cumulative frequency polygon, cumulative fuzzygram, cumulative normal 
curve, cumulative kernel curve. 

Specify separate row and column variables. Allows you to select separate row and 
column variables for an asymmetric matrix. 

Transpose matrix. Transposes the scatterplot matrix by interchanging the row and 
column variables. 

Only display bottom half of matrix and diagonal. Suppresses display of the upper 
(right) half of the matrix (since these are flipped versions of the plots in the lower half). 
Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side by side. Different symbols distinguish separate plots or 
subpopulations. 


You can select Options and Smoother tabs to add a regression line with an optional 
confidence band or smoother, and to add ellipses, hulls, lines and partitions. You can 
also customize the axes, layout, and appearance of the chart. 


See Chapter 5 for more information on smoothers. 


260 
Chapter 6 


Scatterplot Matrix Options 


&. Graph: Scatterplot Matrix (SPLOM) 


Symbol and Label 
Smoother | ХА | ҮА | Layout || Legend 
Confidence ellipse Connectors/partitions 
| None | | E Line connected in case order 


OSample (ЕІ) р [б | | Г] Traveling salesman path 
O Centoid (Elm) р: [0.95 = [Г] Minimum spanning tree 


- E Vertical spikes to Y 

Робово p [R607 BT. a eiie [977] 
[Г] Convex hull around all points | 
E Influence on correlation coefficient — 

Overlapping data 

© Paints overlap 

О Slight random jitter 

© Sunflower symbols 


Confidence ellipse. Draws Gaussian bivariate ellipses for the sample in each plot or 
Gaussian bivariate confidence intervals on the centroid. The difference between Ell 
and Elm is analogous to the standard deviation versus the standard error of the mean. 


m Sample. With Ell, the resulting ellipse is centered on the sample means of the x and 
y variables. The unbiased sample standard deviations of x and y determine its major 
axes and the sample covariance between x and y, its orientation. You can choose 
the size of the ellipse by specifying a probability value between 0 and 1 (both 
exclusive). If you make an extremely large ellipse (0.99), it may extend beyond the 
axes of your plot. The default is 0.6827. 
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= Centroid. As with Ell, the ellipse produced by Elm is centered on the sample means 
of the x and y variables. The unbiased sample standard deviations of x and y 
determine its major axes and the sample Pearson correlation between x and y 
determine its orientation. You can choose the size of the ellipse by specifying a 
probability value between 0 and 1(both exclusive). This size is adjusted by the 
sample size so that the ellipse is always smaller than that produced using Ell. The 
default is 0.95. 


Confidence kernel (p). Nonparametric kernel density estimator is analogous to a 


continuous histogram that shows where the data are most concentrated in the sample. 
You can specify a probability value between 0 and 1.The default is 0.6827. 


Convex hull around points. You can draw a convex hull around all the points in the 
scatterplot. 

Influence on correlation coefficient. Makes the size of the plot symbol represent the 
extent of influence each point exerts on the Pearson correlation coefficient. A scale to 
the right of the plot helps you judge the extent of influence. (The influence of a point 
is the amount the correlation would change if that point were deleted.) 


Overlapping data. When many points overlap on a scatterplot, you can add a slight 
random jitter to make the points easier to distinguish, or draw a sunflower plot, where 
each symbol or "sunflower" represents one or more cases. Sunflowers are lighter or 
darker depending on the number of cases. Only nine symbols are possible, so larger 
counts are plotted with a filled circle. 
Connectors/partitions. Draw lines connecting or partitioning points in a number of 
ways. 
m Line connected in case order. Does just what it says. 
m Travelling salesman path. Tries to find the shortest possible closed path that 
connects all the points, with no repetitions. 
m Minimum spanning tree. Connects а set of points to minimize the sum of the 
lengths of the connecting line segments. 
m Vertical spikes to Y. Draws spikes to a specified value in the y plane. 
Vector lines from. Connects each point to a single point that you specify. 


m Delaunay triangulation. Partitions the nontriangular polygons of a Voronoi 
tessellation into triangles by joining the vertices of the Voronoi polygons. 
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= Voronoi tessellation. Also known as the Dirichlet tessellation or the Thiessen 
diagram, this option produces straight boundaries halfway between points. This 
option presumes you have equivalent distances on the x and y axes. 


Possible customizations include: 

ш Adding a minimum spanning tree to connect points from a multidimensional 
scaling configuration with three dimensions (the variables in the SPLOM). 

= Using vertical spikes to y to represent deviates from 0 when plotting residuals from 
a multivariate regression against the predicted values. 


m Adding vector lines from x or y to plot loadings from a factor analysis—that is, 
connecting the point for each loading to the origin. 


Icon Plot Dialog Box 


Icons are pictures for displaying multivariate data (Everitt, 1978, and Cleveland, 
1993). Given a data set containing measurements of n cases on р variables, you plot n 
icons (one for each case) with p different features in each icon, 


To open the Icon Plot dialog box, from the menus choose: 


Graph 
Multivariate Displays 
Icon Plot... 
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Graph: Multivariate Displays: Icon Plot 


egend | Coe | к. | SymbolandLabel | Line Style | 
Main | |соп[осайоп | Coordinates | ХА | Y-Axis | Layout 


Available variable[s]: Feature variables: 
COUNTRY$ «Required» 
POP. 1983 | = 
POP_1986 
POP_1990 
POP_2020 Grouping variable[s]: 
URBAN [Аа 
BIRTH 82 @ 

EEENE H | (0^ 


ewe | 


Overlay multiple graphs into а single frame 


[Г] Transpose plot 


Feature variables. The shape of the icons depends on the mapping of variables to icon 
features, done in the order you select variables. 


Grouping variable(s). Categorical variables that specify group membership. 
Subpopulations can be plotted side by side in separate frames, or overlaid in a single 
display. 


Icon Plot offers nine types of icons for representing multivariate data: 


ш Star. Star icons are profile icons in polar coordinates; the distance of each point 
from the center of the icon shows the value of the corresponding variable. Separate 
icons are drawn for each case. 
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m Profile. Icons connecting the tops of histogram icon bars with lines and then 
erasing the bars. Each point represents a single variable, and a separate icon is 
drawn for each case. 


m Histogram. Icons assign one histogram bar for each variable. Separate histogram 
icons are drawn for each case. It is helpful to first order the variables according to 
clustering or some other seriation. 


= Fourier blobs. Polar coordinate Fourier waveforms. Each case in the data set is 
represented by a blob, and cases with similar values across all variables will have 
similar shapes. 

= Chernoff's faces. Represent many variables by assigning each variable to a distinct 


facial feature. Cases with similar values for particular variables will have similar 
corresponding facial features. 


= Framed rectangular. Icons represent a single variable by a thermometer-like 
display where the height of the dark portion inside the frame shows the value of the 
variable. Optionally, you can represent a second variable by icon width and a third 
variable by height. 

m Arrow. Icons represent two variables. The first determines the length of the arrow, 
and the second determines its direction, 0 to 360 degrees. Arrow plots are useful 
for showing fluid flow over a surface. 


m Weather vanes. Icons represent three variables. The first determines the radius of 
the central circle, the second determines the length of the vane, and the third 
determines its direction. 

W Sun. Icons are similar to star icons. However, the order of the variables is 
determined by the first principal component, which makes them easier to interpret. 


Overlay multiple graphs into a single frame. You can display multiple graphs in the 
same frame rather than side by side. Different colors distinguish subpopulations. See 
the examples for more complete explanations of each type of icon, 


Transpose plot. Transposes the icon plot. 


In addition, you can change coordinate systems, and customize the layout, axes, and 
appearance of the plots. 

You should try several of these icons on the same data to see how they work. You 
should also compare them to automated techniques such as discriminant analysis and 
clustering. For some data, you will be able to locate clusters that elude automated 
methods because your eye can perceive nonlinear, disjunctive relationships. Icons 
cannot replace formal statistical models, but they are indispensable exploratory tools. 
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Unlike most graphs in SYSTAT, icons are not designed to communicate absolute 
numerical information. They are intended, instead, for recognizing clusters of similar 
objects. Icons are useful for sorting or organizing objects (or cases) that differ across 
several variables. 

Some theorists find the use of icons subjective. They have ridiculed, for example, 
Chernoff's faces as being facetious and cartoon-like. In such conclusions they have 
ignored cognitive science research on multiattribute visual processing, which shows 
that people can accurately categorize multivariate data based on appropriate visual 
cues (Garner, 1974, and Spoehr and Lehmkuhle, 1982). 

When using icons, all variables should be measured on a similar scale. Otherwise, 
one bar in all the icons may be tall and the rest barely visible. Standardize the data if 
necessary. 


Missing data. If a case has one or more values missing among the selected variables, 
SYSTAT does not draw the icon. Instead, it leaves a blank space in the panel of icons 
for each case with values missing. To avoid these empty spaces, select only those cases 
with complete values for the variables, before assigning feature variables and grouping 
variables. 


Ordering in Icon Plots 


Ordering the variables. The shape of icons depends on the mapping of variables to 

icon features. For all SYSTAT icons, this mapping is done in the order in which you 
select variables. With Chernoff’s faces, for example, the first variable is assigned to 
mouth curvature, the second to angle of brow, and so on. 

Some have criticized icons for the arbitrariness of this assignment. There are ways 
to circumvent this problem, however. Research in cognitive processing (Garner and 
Felfoldy, 1970) has shown that integrated displays are more effective for 
communicating multidimensional information. Correlated information is best 
presented within integrated features, rather than across disparate features. One way to 
accomplish this is to order the variables by some seriation method before selecting 
them for an icon plot. 

For example, use Cluster Analysis to order tree branches, and then use the ordering 
for icons. Alternatively, order the variables according to the loadings on the first 
principal component of the correlation matrix of variables. Even better, use the one- 
dimensional multidimensional scaling of this matrix. This way, similar features of 
faces, blobs, stars, and so on will be assigned to correlated variables. Freni-Titulaer, 
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Lambertina, Louv and William (1984), have demonstrated that cluster-ordered icons 
result in fewer judgment errors than randomly ordered icons. 


Ordering the icons. Sometimes you may want to publish a matrix of icons clustered 
according to similarity. One way to do this is to cluster the data and reorder the cases 
according to a seriation based on cluster membership. Then, when you use Icon Plot, 
the icons are arranged in clustered order. Another way to do this is to order the icons 
yourself and rearrange the cases in the file according to your visual ordering. 

For example, print a plot and label each icon with a number. Clip out each icon with 
scissors. Sort them into piles of similar icons. Enter the sequence of numbers as a new 
variable of the data file. Sort the file by that variable. Plot the icons again (in sorted 
order) with their actual labels. This procedure avoids the bias of knowing the 
categories in advance. 


Icon Location 


Icon Location allows you to overlay the icons on a scatterplot or function plot. 


In the Icon Plot dialog box click on the Icon Location tab. 
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Fà | Symbol and Label | tine Style | 
Coordinates | хаш [ vAxs | Layout 


| Available variable(s} Xocalion: 
rau ce cc OPE 

POP. 1983 y 528 me 

~ Remoye j 


a Yocation: 


The positions of the icons are determined by the values of the selected x and y 
variables, with each icon centered on the x and y coordinates for the current case. 


Multiplots 


The GROUP option creates multivariate tables of graphics by stratifying a particular 
graphic type (е.2., BAR, DOT, LINE, PPLOT) across the categorical levels of one or two 
grouping variables. Examples are shown at the end of this chapter and elsewhere. The 
layout of these plots was designed to allow room for 3D, legends, and other unusual 
features that might require more space. 
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The MULTIPLOT option provides a more compact layout for these graphics. This 
layout incorporates the best features from Small Multiple plots (Bertin, 1967; Tufte, 
2001), Row Label plots (Carr, 1994), Coplots (Cleveland, 1993), and Trellis graphics 
(Becker, Cleveland, and Shyu, 1996). The MULTIPLOT layout is, in essence, a single 
table of 2D graphics labeled with tabular categories on the top and left and scale values 
on the bottom and right. The layout is governed by the same format options that apply 
to the GROUP option: ROWS determines the number of rows in the table, COLS 
determines the number of columns. The number of variables assigned to GROUP (1 or 
2) determines whether a one-way or two-way table is produced. In addition, the SCALE 
global option determines the aspect ratio of the overall table (see the“Multiplots: Two- 
way" on page 311 ). 


Using Commands 


To create a graph, first tell SYSTAT what data to use: 


USE filename 
Then specify the name of the procedure (FOURIER, PARALLEL, SPLOM, ICON). 


Andrews's Fourier plot FOURIER waveform-varlist / options 
Parallel coordinate plot PARALLEL parallel-varlist / options 


SPLOM 
One set of variables SPLOM row-varlist / opti 
ptions 
Two sets of variables — SPLOM column-varlist * row-varlist / options 


ICON feature-varlist / options type 
Icon plot where type is ARROW, BLOB, FACE, HIST, PROFILE, 
VANE, THERM, STAR, SUN 


To create a multiplot, specify the name of the procedure (BAR, DOT, LINE, PROFILE, 
PYRAMID, PLOT, PPLOT, QPLOT) and use the MULTIPLOT option with GROUP 


variables. For example, 


PLOT yvar*xvar / GROUP - varlist MULTIPLOT 
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Examples 


Example 1 
Fourier Plots and Parallel Coordinate Displays 


Here, we examine measurements made on 150 iris flowers: sepal length, sepal width, 
petal length, and petal width (in centimeters). The data are from Anderson (1935), 
reprinted in Fisher (1936), and are grouped by species: Setosa, Versicolor, and 
Virginica (coded as 1, 2, and 3, respectively). 

The following examples are Andrews’s Fourier Plots and Parallel Coordinate 
Displays, specifying SPECIES as a grouping variable and assigning line types to each 
of the three groups. We overlay multiple graphs into a single frame to plot the three 
groups within one frame. 


The input is: 


BEGIN 
USE IRIS 
FOURIER SEPALLEN .. PETALWID / GROUP=SPECIES, 
OVERLAY DASH=9,1,11, 
LEGEND-NONE, FILL=1 LOC--2IN, 0IN 
PARALLEL SEPALLEN .. PETALWID / GROUP=SPECIES, 
OVERLAY DASH=9,1,11, 
LEGEND=NONE, FILL=1 LOC-4IN, OIN 


END 


The output is: 


SEPADWD PETALLEN 
Index of Case 


The Setosa flowers (coded as 1) are drawn with short dashes (line dash type 9); 
Versicolor (2), with solid lines (dash type 1); and Virginica (3) with dotted lines (dash 
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type 11). The wave form for the Setosa flowers is lower than the others and shifted to 
the right. In the parallel plot, we see that the length and width of the Setosa petals are 
significantly shorter than those for the other groups. 


Stratifying by Species 


Here, we plot each species in a separate frame. We see the wave form for each group 
and also the pattern of the parallel coordinates more clearly. 


The input is: 


BEGIN 
CSIZE 1.2 
USE IRIS 
FOURIER SEPALLEN .. PETALWID / GROUP-SPECIES ROW-1 
PARALLEL SEPALLEN .. PETALWID / GROUP-SPECIES, 
YMAX-8  ROW-1, 
FILL=1 LOC--1IN,-.5IN 


END 
The output is: 
1 2 
" 
| 
а= E т=з 
Degmes. Degrees. 


= 5 
chin РАМО AL TAO тий ТАЙМ TAC 
Case iof Caso 


The scaling on the Andrews's Fourier Plot differs for computations done separately 
and those for all three groups combined as one sample. 
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Example 2 
Scatterplot Matrices 


In this example, we display the time (in minutes) that people must work to earn enough 
money to purchase a Big Mac against the length of a typical workweek (in hours). We 
also plot these variables versus net hourly earnings (in U.S. dollars) and the percentage 
of a worker's income that goes to taxes and social security. That is, we display 
scatterplots for each pair of four variables (BIG_MAC, WORKWEEK, EARNINGS, and 
PCTTAXES). The RCITY data file has one case (row) for each of the 46 cities in the 
sample. We first display the default SPLOM. 


The input is: 
USE RCITY 


SPLOM WORKWEEK EARNINGS PCTTAXES BIG_MAC / SYM=1, 
SIZE=1.8 FILL=.8 


The output is: 


PCTTAXES EARNINGS — WORKWEEK 
SAXVLLOd  SONINHV3 УЭЭМУУОМ 


ovn 91a 


BIG_MAC 


The plot of BIG_MAC versus WORKWEEK is in the lower left corner of the display. 
To identify the у variable in other plots, look at the name in its row; to identify the x 
variable, read the name of its column. Tick marks and other labels are omitted because 
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they distract from the clarity of the plot. If you need scale information on any variable, 


use scatterplot to produce a separate plot. 
Notice that the scatterplots above the diagonal are reflections of those below 


because the row and column variables are complementary. 


Adding City Names 


Here we specify only the bottom half of the matrix and diagonal and we add names to 
identify the more unusual cities. Since more than 40 names are printed, the display will 
be crowded so we omit the variable, EARNINGS. 


The input is: 
USE RCITY 


SPLOM WORKWEEK PCTTAXES BIG MAC / HALF LABEL=CITYS, 
FILL-.8 CSIZE=1.8 


The output is: 


WORKWEEK 


PCTTAXES 


BIG MAC 


WORKWEEK PCTTAXES BIG MAC 


One of the most useful features of scatterplot matrices is that you can identify outliers 
(outside values) and then track them across other plots to observe values of other 
variables for the case. For example, we now see that Mexico City is the lone point at 
the top of the bottom left plot. Since the vertical scale is the same for all plots within a 
row, we can skip back to the first SPLOM and track Mexico City across the bottom row 
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of plots. Its values of EARNINGS and PC: TTAXES are low relative to those for the other 
cities in the sample. 

We find that Hong Kong has the longest workweek (45.7 hours). Its working time 
to buy a Big Mac is among the shorter times (24 minutes) and its tax rate is among the 
lowest (5.8%). For PCTTAXES, Stockholm, Sweden stands out with a high of 55%, but 
its workweek (34.7 hours) is shorter than that for many cities and the working time to 
buy a Big Mac is about an hour (61 minutes). In the first SPLOM, Zurich and Geneva 
are the cities with the largest EARNINGS. 


Symbols for Regions of the World 


You can incorporate information about an additional numeric or string variable within 
each bivariate plot. The values of REGIONS identify the geographic location of each 
city as Europe, Africa, М. America, M.&.S. America, and Pacific/Asia. By specifying 
SYMBOL as REGIONS, the plot symbol appears as the first letter of each region name. 
The input is: 

USE RCITY 


SPLOM WORKWEEK EARNINGS PCTTAXES BIG_MAC / HALF FILL=1, 
SYMBOL=REGION$ SIZE=2 


The output is: 


WORKWEEK 


PCTTAXES — EARNINGS 


BIG MAC 
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Example 3 
SPLOM for Fisher's Iris Data 


In this example, we study four measurements of irises in a SPLOM and identify the 
species. 


The input is: 
USE IRIS 
SPLOM SEPALLEN .. PETALWID / SYMBOL-SPECIES COLOR=SPECIES, 
HALF SIZE-1.3 FILL=.75, 
LEGEND-NONE è 


SPECIES is a numeric variable with codes 1, 2, and 3, so SYSTAT uses its built-in 
symbols and colors that correspond to these codes. 


The output is: 


PETALWID — PETALLEN — SEPALWID  SEPALLEN 


FETALLEN FETALWD 


With only a quick glance, it is easy to see that the Setosa flowers (circles) stand apart 
from the others. 
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Multivariate Displays 


In the parallel coordinate plot example, we notice that the size of the Setosa sepals is 
much greater than that of its petals (relative to the other two groups). Relations among 
the other four variables might characterize differences among the groups; differences 
such as shape, area, and the ratio of petal width to sepal width. In addition, we overlay 
the SPLOM's for each species using symbols and colors to differentiate the species. 


The input is: 


USE IRIS 

LET SEP AREA 
LET PET AREA 
LET SEP SHAPE 
LET PET SHAPE 
LET RATIO SP 


uum 


SEPALWID 
PETALWID 
SEPALWID 
PETALWID 
PETALWID 


* 
B 
/ 
/ 
/ 


SEPALLEN 
PETALLEN 
SEPALLEN 
PETALLEN 
SEPALWID 


SPLOM SEP AREA PET AREA SEP SHAPE PET SHAPE RATIO SP, 
/ SYMBOL-SPECIES SIZE-2 HALF, 
COLOR-SPECIES  LEGEND-NONE FILL=.75 


The output is: 


SEP AREA 


PET SHAPE SEP SHAPE РЕТ AREA 


RATIO SP 


The separation of the Versicolor and Virginica flowers is still not as great as that for the 


Setosa flowers, but is more clear on a color monitor. 
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One Set of Variables against Another 


Possibly a combination of the original variables and derived measures might be the 
best discriminator among the groups. We could examine a SPLOM of all nine 
variables; or as we do here, just the cross-classification of the original variables and the 
derived measures. Since it is clear that the Setosa flowers differ markedly from the 
other two groups, eliminate them. 


The input is: 
USE IRIS 


SELECT SPECIES «» 1 
LET SPECIES1 - SPECIES -1 


LET PET AREA = PETALWID * PETALLEN 
LET SEP SHAPE = SEPALWID / SEPALLEN 
LET PET. SHAPE = PETALWID / PETALLEN 
LET RATIO SP = PETALWID / SEPALWID 
SPLOM SEP AREA PET AREA SEP SHAPE PET SHAPE RATIO SP, 
* SEPALLEN .. PETALWID / SYMBOL=SPECIES1, 
SIZE-2 COLOR=SPECIES1, 
LEGEND-NONE FILL=.75 


The output is: 


ма 


SHAPE РЕТ AREA — sep АЯҒА 


PET SHAPE SEP 


dS OLVH 3dvHS 13d 3dvHS d3S V3HV 13d  V3Hv das 


RATIO SP. 
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Example 4 
Density Displays in Scatterplot Matrices 


Following Hartigan (1975), you can insert density diagrams in the diagonal cells of a 
SPLOM. In this example, we display box plots instead of the default histograms. 


The input is: 
USE RCITY 
SPLOM WORKWEEK EARNINGS PCTTAXES BIG MAC / DENSITY=BOX FILL=.80 


The output is: 


S3XVliDd  SONINHV3 XIIMWYOM 


BIG MAC РСТТАХЕЗ EARNINGS WORKWEEK 


Озю oa 


Outliers are identified in the box plots of all variables except that for EARNINGS. 


Dot Histogram Displays and Kernel Estimators 


Using the Fisher iris data, we illustrate dot histogram and kernel density estimators for 
each univariate distribution. 
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The input is: 
USE IRIS 
BEGIN 
SPLOM SEPALLEN .. PETALWID / DENSITY-DIT, 
FILL-1 LOC--2IN,OIN 
SPLOM SEPALLEN .. PETALWID / DENSITY-KERNEL, 
FILL-1 LOC-3IN,OIN 
END 
The output is: 


PETALWIO PETALLEN — SEPALWID ЗЕРАЦЕМ 
OwlviSd мзттаза OWüvd3s NATIVES 
OWUWiSd маттїза ONWKvd3s  N3TYWd3S 


SEPALEN SEPAWO PETALEN — PETALWD 


The fact that the values PETALLEN and PETALWID are not from a single homogenous 
sample is more apparent with the dit and kernel displays than with the default 
histograms. 


Example 5 
Subpopulations 


As is the case for Scatterplot, SYSTAT can draw features separately for each group 
within a single frame. To do this, specify a grouping variable and select Overlay 
multiple graphs into a single frame. In the first graph below, we add a confidence ellipse 
for each group, and then draw normal curves along the diagonal of the display. In the 
second display, we use confidence kernel instead of sample (Ell), and select kernel 
curve rather than normal curve. 
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The input is: 


BEGIN 
USE IRIS 
SPLOM SEPALLEN .. PETALWID / GROUP-SPECIES OVERLAY, 
ELL DENSITY=NORMAL LEGEND=NONE, 
FILL-1 LOC--2IN,0IN COLOR-2,1, 10 
SPLOM SEPALLEN .. PETALWID / GROUP-SPECIES OVERLAY, 
KERNEL DENSITY-KERNEL  LEGEND-NONE, 
= FILL-1 LOC=3IN, ОІМ COLOR=2,1, 10 


The output is: 


этма 


SEPALWID —— SEPALLEN. 


Guias 


NaI Wise 


PETALWID —— PETALLEN 


CONE] 


Se 
At 
Ew 
S 
5 


Example 6 
Line SPLOM's 


Time series and other data plot best when the points are connected with a line. The data 
in the LABOR file are output productivity per labor hour in 1977 U.S. dollars for a 25- 
year period from the U.S. Bureau of Labor Statistics. 


YEAR US CANADA JAPAN GERMANY ENGLAND 


1960 622 50.36 232 40.3 53.8 
1965 76.6 62.7 35.0 54.0 63.9 
1970 80.0 76.8 64.8 71.2 71.6 
1975 929 91.8 87.7 90.1 94.3 
1980 101.4 101.9 122.7 108.6 101.2 


1985 121.8 115.1 159.9 131.9 129.7 
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The data are in time order in the file. If the data were in some other order, selecting Line 
connected in case order would connect successive points and produce a messy spider 
web in each cell of the SPLOM. Here is a SPLOM of each country's productivity 
against the YEAR variable. 


The input is: 


USE LABOR 
SPLOM GERMANY JAPAN CANADA US ENGLAND*YEAR/, 
HEIGHT-3IN WIDTH-6IN, 
LINE SIZE-0 YMIN=0, 
YMAX-200 


Notice that we set the symbol size to 0 to make the symbols invisible. We also defined 
the minimum and maximum of the y axis to keep the vertical scale constant across cells. 


The output is: 


Example 7 
Smoothers 


In this example, we apply the inverse and step smoothers to the LABOR data. 
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The input is: 


BEGIN 
USE LABOR 
SPLOM US .. ENGLAND * YEAR / SMOOTH-INVERSE SIZE=0, 
YMIN-0 YMAX=200, 
LOC=-1IN, QIN 
SPLOM US .. ENGLAND * YEAR / SMOOTH-STEP SIZE=0, 
YMIN-0 ҮМАХ=200, 
LOC-1IN, OIN 
END 


The output is: 


us 
us 


ARRAN 


sn 


CANADA 
VOVNVO 
CANADA 
VOVNVO 


амулома 
ANVINH3O. Nvdvf 


ENGLANI GERMANY 
ANVWH3O. 
ENGLANI GERMANY 


ONVIONS 


LOWESS 


Here are examples of the LOWESS smooth using two sets of variables from the 
OURWORLD file that contains data for 57 countries. In the first SPLOM, the data are 
plotted as recorded; in the second SPLOM, we log transform a set of column variables 
(gross domestic product and expenditures for military, health, and education). 
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The input is: 


BEGIN 
USE OURWORLD 
SPLOM LITERACY URBAN LIFEEXPF BABYMORT, 
* GDP CAP MIL EDUC HEALTH / SMOOTH-LOWESS, 
SYMBOL-1 SIZE=.7, 
LOC--2IN,OIN FILL-1 
SPLOM LITERACY URBAN LIFEEXPF BABYMORT, 
* GDP CAP MIL EDUC HEALTH / XLOG FILL-1, 
SMOOTH-LOWESS , 
SYMBOL-1 FILL-1, 
LOC=3.5IN, OIN 
END 


The output is: 


After the log transformations, the linear relations appear stronger. 


Example 8 
Regression Lines with Confidence Bands 


You can plot confidence bands on the regression lines in each cell by specifying the 
size of the confidence interval. For example, if we specify 0.90, SYSTAT draws upper 
and lower hyperbolic bands around the fitted line. These bands mean the following: if 
the discrepancies (residuals) between the fitted and observed values for y at each x are 
normally distributed and independent of each other and have the same spread 
(variance), then 90 times out of 100, confidence intervals constructed by SYSTAT will 
cover the true regression line relating y to x. 
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The input is: 


USE RCITY 
LET BIG  MAC-L10 (BIG MAC) 
SPLOM WORKWEEK EARNINGS PCTTAXES BIG MAC / , 
SMOOTH-LINEAR CONFI, 
HALF  FILL-.75 


The output is: 


WORKWEEK 


EARNINGS. 


PCTTAXES 
7. = 


There appears to be no relation between BJG_MAC and PCTTAXES (bottom row)— 
the regression line could be rotated to a horizontal position without intersecting the 
confidence band. This is not the case for BIG MAC versus EARNINGS. 


Example 9 


Influence SPLOM 
The influence ofa point in a scatterplot on the correlation coefficient is the amount the 
deleted. Plotting influences can help us 


correlation would change if that point were 
determine whether a linear fit to the scatterplot is relatively robust or is dependent on 
just a few points. Selecting Influence on correlation coefficient makes the size of the 
plot symbol represent the extent of influence from each point. Positive influences are 
represented by hollow symbols and negative influences by filled symbols. 

Other types of influence or leverage on statistical estimators can be represented by 


using statistics computed with items from Linear Regression. 
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The input is: 
USE RCITY 
LET BIG MAC-L10 (BIG МАС) 
SPLOM WORKWEEK EARNINGS PCTTAXES BIG MAC / INFLUENCE COLOR-BLACK, 
LEGEND-NONE 
The output is: 
= 
@ 
Е 
Example 10 


Confidence Ellipses for Sample and Centroid 


You can draw Gaussian bivariate ellipses for the sample in each plot or Gaussian 
bivariate confidence intervals on the centroid. The difference between ЕП and Elm is 
analogous to the standard deviation versus the standard error of the mean. 

Here we use sample ellipses to characterize the same relationships described with 
regression lines and confidence bands. 
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The input is: 


BEGIN 
USE RCITY 
LET BIG MAC - L10(BIG MAC) 
SPLOM WORKWEEK EARNINGS PCTTAXES BIG MAC / HALF ELL FILL-.75, 
LOC--2IN, OIN, 
TITLE-'The sample' 
SPLOM WORKWEEK EARNINGS PCTTAXES BIG MAC / HALF ELM FILL-.75, 
LOC-3IN, OIN, 
TITLE='The centroid' 
END 


The output is: 


The sample The centroid 


| 


BIG MAC РСТТАХЕЗ EARNINGS WORKWEEK 


BIG MAC — PCTTAXES EARNINGS 


Example 11 
Convex Hulls and Kernel Contours 


Confidence ellipses for the sample are great for bivariate normal distributions but can 
be misleading when the data depart from normality. Here, we use the OUR WORLD 
data that have distinct subpopulations. We request a nonparametric 65% kernel density 
estimator and a convex hull. 
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The input is: 
BEGIN 
USE OURWORLD 
SPLOM BABYMORT URBAN BIRTH, RT DEATH RT, 
GDP CAP LITERACY / HALF KERNEL=.65  FILL-.75, 
LOC--21N,0IN TITLE-'Kernel' 
SPLOM BABYMORT URBAN BIRTH RT DEATH RT, 
GDP CAP LITERACY / HALF HULL  FILL-.75, 
LOC-2.751N,0IN TITLE-'Hull' 
END 
The output is: 
Kernel Hull 
so 
Hr 
TE 
$ " 
i 
à i2 
The organic kernel contours clearly highlight the presence of subpopulations. 
Example 12 
Transformations 


SPLOM’s are a helpful graphic for assessing the effect of transforming several 
variables. If you have two sets of variables, as in 


SPLOMABC*DEFG 


use the y axis transformations YPOW and YLOG to transform variables A, B, and C. Use 
XLOG and XPOW to transform variables D, E, F and G. These options are useful for 
transforming every variable in a set of variables. Here are economic indicators from the 
OURWORLD data, plotted as recorded and after a log (base 10) transformation. 
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The input is: 


BEGIN 
USE OURWORLD 
SPLOM GDP CAP EDUC MIL HEALTH / HALF SYM-1 SIZE-.8, 
FILL=.9 LOC=-2IN, 0IN 
SPLOM GDP_CAP EDUC MIL HEALTH / HALF XLOG SYM=1 SIZE=.8, 
FILL-.9 LOC=3IN, 0IN 
END 


The output is: 


LET Statements 


If you do not want to apply the same transformations to all the variables in a set, create 
aLET statement. Here we use URBAN, the percentage ofthe population living in cities, 
instead of GDP. CAP. As you can see below from its histogram, the distribution of 
URBAN is fairly symmetric (no transformation needed). Use the L10 function to log 
transform the three economic variables. You may use the shortcut notation @ in the 
Log transformation to represent the three variables. 


The input is: 


USE OURWORLD 

SPLOM EDUC MIL HEALTH URBAN / HALF SYM=1, 
SIZE-.8 FILL=.75, 
SMOOTH-LOWESS 

LET (EDUC,MIL,HEALTH) - L10(@) 

SPLOM EDUC MIL HEALTH URBAN / HALF SYM=1, 
SIZE-.8 FILL=.75, 
SMOOTH=LOWESS 
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The output is: 


Example 13 
Bubble SPLOM 


You can use the values of a variable in your file to control the size of a plot. This feature 
is useful for displaying the value of an additional variable against each pair of variables 
ina SPLOM. You should usually use empty symbols with this type of plot, since filled 
ones can occlude each other thus making the plot difficult to interpret. 

In this example, we make a new variable GDP_DEN using GDP_CAP from the 
OURWORLD data file. Its values will control the size of each plot symbol and will range 
from 0 to 4. 


The input is: 


USE OURWORLD 

LET GDP_DEN=4*SQR(GDP_CAP / 20000) 

SPLOM URBAN LITERACY BABYMORT / SIZE=GDP_DEN, 
SYMBOL=1 HALF, 
LEGEND=NONE 
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The output is: 


URBAN 


LITERACY 


BABYMORT 


From the plot points with large circles, we see that in countries with a high gross 
domestic product, the population is highly literate and fairly urban, and the infant 
mortality rate is quite low. 


Creating SUBWRLD2 


Many of our icon plot examples use a permutation of the SUB WORLD data file that 
contains one case for 30 countries. The variables are standardized and sorted in 
descending GDP_CAP order. Because the distribution for several of our variables are 
severely right-skewed, we transform them to log base 10 units to symmetrize the 
distributions before they are standardized. 


The input is: 


USE SUBWORLD 

LET (GDP_CAP,EDUC,MIL, HEALTH) =L10 (@) 
STAND / SD 

DSAVE SUBWRLD2 

SORT GDP_CAP / D 
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For the SUBWRLDO?2 data, we select only cases with complete values for the variables 
using the NUM function. For each case, NUM counts the number of variables with data 
present. 

SELECT NUM(GDP CAP, EDUC, HEALTH, MIL, LIFEEXPF,, 


LIFEEXPM, LITERACY, URBAN, DEATH RT,, 
BIRTH RT, BABYMORT, B TO D)-12 


A case is used only if its count of data values equals 12 (the number of variables that 
NUM scans). 


Example 14 
Histogram Icons 


The histogram icon assigns one histogram bar (left to right) to each variable in the 
order that you select them. The bars are scaled so that the largest value in the data is 
the tallest bar and the smallest value is the shortest. Freni-Titulaer, Lambertina, Louv, 
and William(1984) show examples of this icon. 

We use the data stored in the SUBWRLD? file. Each variable is standardized and the 
cases are sorted by GDP. CAP, gross domestic product per capita, with high values 


first. Using the SELECT NUM statement, we guarantee that only cases with complete 
data are used. 


The input is: 


USE SUBWRLD2 
SELECT NUM(GDP CAP, EDUC, HEALTH, MIL, LIFEEXPF , LIFEEXPM, , 
LITERACY, URBAN, DEATH RT,BIRTH RT,BABYMORT,, 
B TO D)-12 
ICON GDP CAP EDUC HEALTH MIL LIFEEXPF LIFEEXPM, 
LITERACY URBAN DEATH RT BIRTH RT BABYMORT, 
B TO D / HIST LABEL-COUNTRYS FILL=.75 
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The output is: 


Бы Шы He Hs 
Canada Switzerland Sweden WGermany France 
Eh M M M 
Denmark UK Italy Austria Caechosiov 
"" ыа ый шы 
Libya Poland Uruguay Brazil Argentina 
Мы шы ШШ ЫШ ый 
Cuba Chile Iraq Panama Guatemala 
"mE M N' 
ElSalv ador. Ecuador Peru Haiti 


The first bar corresponds to the variable GDP_CAP (gross domestic product per 
capita) and the last bar to В TO D (birth to death rate ratio). The first four variables in 
each icon are a set of highly correlated economic measures. Notice that Canada and the 
European countries (located toward top of the display) have high values for these 
variables, while the countries toward the bottom of the display have low values. The 
opposite is true for the last four variables that measure birth and death rates. 


Unstandardized Results 


The following display uses data that are not standardized. 
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The input is: 
USE SUBWORLD 
SORT GDP. CAP / D 
ICON GDP. CAP EDUC HEALTH MIL LIFEEXPF LIFEEXPM, 
LITERACY URBAN DEATH RT BIRTH RT BABYMORT,B TO D /, 
HIST LABEL=COUNTRY$, 
FILL-1 
The output is: 
Carada Switzerland MWGerray France 


L_ 


UK 


E 
Lio ce dicc 
Czechosiov 
cS 
Argentina 
Gaerda 
Haiti 


MEME 
5 


Hod = 


Qina Sordia Bhaja 


Notice how the large values of gross domestic product swamp the other values. Empty 
spaces are created where case values are missing. 


Controlling the Number of Columns and Rows 


SYSTAT determines the number of rows and columns for displaying your icons. If you 
set the number of columns, SY STAT draws as many rows as necessary to cover all your 
cases, or vice versa. If you set both the number of rows (r) and the number of columns 
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(c), SYSTAT draws only the first r * c icons. The histograms are displayed across seven 
columns. 


The input is: 


USE SUBWRLD2 
SELECT NUM(GDP. CAP, EDUC, HEALTH, MIL, LIFEEXPF,LI FEEXPM, , 
LITERACY, URBAN, DEATH RT,BIRTH RT,BABYMORT,, 
B TO D)-12 
ICON GDP CAP EDUC HEALTH MIL LIFEEXPF LIFEEXPM, 
LITERACY URBAN DEATH RT BIRTH RT BABYMORT, 
B TO D / HIST LABEL=COUNTRY$  FILL-1 COL-7 


The output is: 


BSdvar 


Example 15 
Fourier Blobs 


Fourier blobs are polar coordinate Fourier wave forms. Fourier functions have the 
following form: 
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f(t) = y, + y,sin(t) + у;сов(ї) + y,sin(2t) + yscos(21)... 
p 


where y is a p-dimensional variate and / varies from —3.14 to 3.14 (p radians on either 
side of 0). The result of this transformation is a set of wave forms made up of sine and 
cosine components. Each wave form corresponds to one case in the data. Cases that 
have similar values across all variables have comparable wave forms; cases with 
different patterns of variation have contrasting wave forms. When these wave forms 
are transformed into polar coordinates, they look like blobs or amoebae. The 
information contained in Fourier blobs is therefore identical to that of the Andrews's 
Fourier Plot. The advantage of blobs is that they do not overlap and they can be used 
as plot symbols in a dimensional plot. 

How do you interpret these blobs? What are the variable values? Keep in mind that 
the point of the icon is not to translate back to numerical values. The Fourier 
transformation is too complex for us to compute those values mentally. Instead, look 
for similar blobs and then go back to the raw data to examine actual values with other 
types of graphs. 

The shape of the blobs depends on the order in which you select the variables. 
Variables selected earlier are weighted with lower frequency components in the above 
equation and later ones with higher frequency. 


The input is: 


USE SUBWRLD2 
SELECT NUM(GDP_CAP, EDUC, HEALTH, MIL, LIFEEXPF, LIFEEXPM, , 
LITERACY, URBAN, DEATH_RT, BIRTH_RT, BABYMORT, , 
B TO D)-12 
ICON GDP CAP EDUC HEALTH MIL LIFEEXPF LIFEEXPM, 
LITERACY URBAN DEATH RT BIRTH RT BABYMORT, 
B TO D / BLOB LABEL-COUNTRYS FILL=.75 
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The output is: 


$9 «9 49 49 00 
49 49 ә qe r 
de errre 
w ж” de ч? фе 
ч” у x» d» t€ 
ЕЕЕ 


Example 16 
Chernoff's Faces 


Faces are one of the most effective graphical icons for visually clustering multivariate 
data, particularly for long-term memory processing. Chernoff introduced the idea of 
using a cartoon face to represent many variables. Wang (1978) contains a number of 
articles on applications of faces to multivariate data. Wilkinson (1982) showed that 


faces can be more effective than many other icons for similarity comparisons. 
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Reviewers have criticized the faces for their arbitrary assignment of variables to 
features on the face. Jacob (1983) addresses this problem. Here is the order of features 
assigned to the variables by Chernoff's Faces: 


1 Curvature of mouth 11 Half-length of eyes 
2 Angle of brow 12 Position of pupils 
3 Width of nose 13 Height of eyebrow 
4 Length of nose 14 Length of brow 
5 Length of mouth 15 Height of face 
6 Height of center of mouth 16 Eccentricity of upper ellipse of face 
7 Separation of eyes 17 Eccentricity of lower ellipse of face 
8 Height of center of eyes 18 Earlevel 
9 Slant of eyes 19 Radius of ear 
10 Eccentricity of eyes 20 Hair length 


If you select variables A, B, C, and D in that order, for example, A, B, C, and D are 
assigned to the first four features in the list. You may select variables more than once 
to make multiple assignments (for example, A, A, B, C), but we would do this only for 
correlated features, such as mouth curvature and brow tilt. 


Skipping facial features. To skip features, generate a variable containing a constant 
(K). If using SD standardized variables, set the constant to 0 to make the features 
neutral; if using Range standardized variables, set the constant to 0.5. Thereafter, 
adding the new variable to the list of Feature variables will cause SYSTAT to skip the 
feature whose place it has taken. For example, say you want to assi ign GDP_CAP to 
the curvature of the mouth, and EDUC to the length of the nose. If K is your new 
variable, first add GDP_CAP to the list of Feature variables, then add K twice, and 
finally select EDUC. If you select more than 20 variables, SYSTAT reports an error. 
Because there are 20 facial features and we have only 12 variables, we make multiple 
assignments. 


The input is: 


USE SUBWRLD2 

LET K = 0 

SELECT NUM(GDP CAP, EDUC, HEALTH, MIL, LIFEEXPF , LIFEEXPM, , 
LITERACY, URBAN, DEATH_RT, BIRTH_RT, BABYMORT, B_TO_D) =12 

ICON GDP CAP EDUC HEALTH MIL GDP CAP B TO D URBAN, _ 
LITERACY LITERACY DEATH RT DEATH RT BABYMORT, 
URBAN K URBAN LIFEEXPF LIFEEXPM BIRTH RT B TO D, 
LITERACY / FACE LABEL=COUNTRY$ 
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The output is: 


In this display, gross domestic product, GDP. CAP, determines the length of the mouth. 
Compare the mouth lengths for countries in the first row to that for countries in the 
bottom row. The values of URBAN determine how far apart the eyes are—compare 
Germany with Ethiopia. LITERAC Y, the 20th variable we select, determines hair 
length. Haiti, Guinea, and Somalia have the shortest hair. 


Example 17 

Profile Icons 
Chambers, Cleveland, Kleiner, and Tukey (1983) discuss profile icons. They are 
identical to histogram icons except that the tops of the bars are connected by lines and 
the bars are not drawn. 
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Stars plots are profile icons drawn in polar coordinates. To our knowledge, there is 
no research showing which of the three is better. 

Assign one profile point to each variable in the order that you select them. Profiles 
can be improved by drawing them in a rational order. As with the other icons, you 
should make sure that the variables are on similar scales. Otherwise, one point will be 
high in all the icons and the rest will be barely visible. 


The input is: 


USE SUBWRLD2 
SELECT NUM(GDP CAP, EDUC, HEALTH, MIL, LIFEEXPF, LIFEEXPM, , 
LITERACY, URBAN, DEATH_RT, BIRTH_RT, BABYMORT, , 
B TO D)-12 
ICON GDP CAP EDUC HEALTH MIL LIFEEXPF LIFEEXPM, 
LITERACY URBAN DEATH RT BIRTH RT BABYMORT, 
B TO D / PROFILE LABEL=COUNTRY$  FILL-.5 


The output is: 


Canada 


ih. BA 

Switzerland Sweden 
| Ba M M M пь 
Denmark UK Italy Austria Czechoslov 
ші Шы мы ым мы 
Шуа Poland Uruguay Brazil Argentina 
" мы NM E nadi 
Cuba Chile Iraq Panama Guatemala 
Colombia Salvador Ecuador aeg Hati 

li 

Guinea Somalia Ethiopia 
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Example 18 
Star Plots 


Star icons are profile icons in polar coordinates. Imagine that you have 12 variables, 
for example. For each case, draw a clock with 12 hands, one pointing to each hour, 
where the length of each hand is determined by the value of the variable. Next, draw a 
line connecting the tips of each hand. Finally, erase the clock. 

Star can represent up to 200 variables (hands), but you usually don't want to do 
more than a few. As with the other icons, it is helpful to order the variables around the 
circle (clock) so that correlated variables are near each other. 


The input is: 


USE SUBWRLD2 
SELECT NUM (GDP. CAP, EDUC, HEALTH, MIL, LIFEEXPF,LIFEEXPM, , 


LITERACY, URBAN, DEATH RT, BIRTH. RT,BABYMORT, , 
B. TO D)-12 
ICON GDP.CAP EDUC HEALTH MIL LIFEEXPF LIFEEXPM, 
LITERACY URBAN DEATH RT BIRTH RT BABYMORT, 
B TO D / STAR LABEL=COUNTRY$ FILL=1 


The output is: 

ө ө ә ô 9 

Canada ‘Switzerland Sweden WGermany France 
e 
Caechoslov 
Е 


6:64: 6: 6% 


Ф 
$ 


;4:$:9:0:0 
|i5/6:6/9.0 
гае. 48 Xl Ф 


300 
Chapter 6 


Although the variables you specify determine the length of the *hands," the area of the 
resulting icon may be misleading. To dampen the effect that the variables have on the 
area, you can take the square root of each measure before making the plot. 


Example 19 
Rectangular Icons 


Cleveland and McGill (1985) discussed an icon to represent a single variable. Framed 
rectangles are like thermometers that show the “temperature” of a single variable for 
each case and can represent up to three variables. Following Dunn (1987), we have 
made it possible to vary the mercury level, the width of each thermometer, and the 
height of each thermometer. The first variable you specify corresponds to the mercury; 
the second, the width; and the third, the height. As Dunn points out, counts and sample 
sizes are particularly appropriate as width variables. The width variable must be 
between 0 and 1; values 1 or larger produce maximum width rectangles. We use the 
USINCOME data and standardize the values to a 0,1 scale: 


DIVISIONS — COUNT AVERAGE 


INCOME 
England 6 33,626.5 
Mid Atlantic 3 34,070.0 
S Atlantic 8 29,700.1 
EN Central 5 30,402.6 
ES Central 4 23,009.5 
WN Central 7 27,904.7 
W S Central 4 25,482.2 
Mountain 8 28,715.2 
Pacific 5 35,136.4 
The input is: 


USE USINCOME 
STAND / SD 
ICON INCOME COUNT / THERM LABEL=DIVISIONS 
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The output is: 
1 om 
New England Mid Atlantic. S Atlantic 
EN Central E S Central WN Central 
= i 
W S Central Mountain Pacific 


The width of each rectangle represents the number of states in the subsample; the 
“mercury height” represents the average income. 


Example 20 
Arrow Plots 


Arrow icons represent two variables only. The first variable is assigned to the length of the 
arrow, the second to its direction (from 0 to 360 degrees). Zero and 360 degrees are vertical 
orientations for arrows. For length, the minimum value of the data is set to zero length and 
the maximum value to a scaled length. The arrowhead is not rescaled. For direction, 
minimum and maximum values in the data are scaled between 0 and 360 degrees. 

Arrow plots are often used to show fluid flow over a surface. You can overlay them 
on a scatterplot or function plot, for example, by specifying variables defining the icon 
location. 

Here is an example which shows summer and winter temperatures as the length and 
direction of the arrows, respectively. The data have not been standardized. 


The input is: 


USE US 

ICON SUMMER WINTER / ARROW ILOC=LABLON, LABLAT, 
HEIGHT=2IN WIDTH=4IN, 
LABEL=STATE$ 
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The output is: 


Summer temperatures do not vary substantially (the lengths of the lines are similar). 
On the other hand, average winter temperatures (represented by the direction of the 
arrow) show substantial variation. Notice, for example, that the arrow for North 
Dakota points almost straight up, indicating a low average winter temperature. 
Compare this to the arrows for Florida and Arizona. 


Example 21 
Weather Vanes Plot 


Weather Vane plots are used to represent three variables. The first determines the 
radius of the central circle. The second variable determines the length of the vane, and 
the third, its direction. See the Arrow plots example for information concerning scaling 
of the length and direction of the vector. 


This vane plot is similar to the arrow plot, but adds circles that represent average 
rainfall. 


The input is: 


USE US 

ICON RAIN SUMMER WINTER / VANE ILOC=LABLON, LABLAT, 
HEIGHT-2IN WIDTH=4IN, 
LABEL-STATE$ 
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The output is: 


Notice that rainfall is concentrated on the coasts and southeast. 


Example 22 
Icon-Enhanced Maps 


Wainer and Thissen (1981), and Chambers et al. (1983) discuss using icons to enhance 
scatterplots by placing them at the location of x and y variables. Icons can be placed on 
maps, for example, or on scatterplots oftwo related variables. Use icon location (ILOC) 
to specify the variables that determine the position ofthe icons. You may need to adjust 
the symbol size to keep icons from colliding. 

Use the x axis and y axis minimum and maximum to control the axes limits when 
you use Icon Location. They work the same way they do for Scatterplot. Since axes are 
not drawn when you use icons this way, you may wonder why these limits are 
necessary. The reason they are is to allow you to combine maps, plots, and other graphs 
with icons in a single display and ensure that the icons are correctly located. This 
example shows a plot of the western region of the United States. 
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The input is: 


USE USSTATES 
STAND PULMONAR, DIVORCE, MSTROKE, FSTROKE, GOVSLRY, 
TEACHERS, CARDIO, MATH, VERBAL / RANGE 
LET K-.5 
SELECT REGION$-'WEST' 
BEGIN 
MAP / PROJECT-STEREO 
ICON GOVSLRY GOVSLRY К К GOVSLRY К, 
MATH VERBAL TEACHERS TEACHERS, 
K CARDIO CARDIO K PULMONAR FSTROKE, 
MSTROKE К К DIVORCE / FACE PROJECT=STEREO, 
ILOC=LABLON, LABLAT, 
SIZE-.4 LABEL=STATES 
END 


The output is: 


In California, the governor’s salary is quite high compared to that of Utah, as 
represented by curvature of the mouth. Nevada has the highest divorce rate, as 
represented by hair length. 
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Example 23 
Multiplots: One-way 


The simplest application of the MULTIPLOT layout is for a plot stratified by a single 
grouping variable. This example shows a scatterplot of summer and winter 
temperatures stratified by region of the US. The GROUP option specifies the REGIONS 
variable for producing the stratification. The MULTI option glues the separate plots 
together in a single table. 

The default number of columns for a one-way multiplot is 3. After 3 columns, 
SYSTAT begins a new row of plots. The gap between the rows leaves room for the 
category labels above the plots and signals that the table is one-way rather than two- 
way. You can change this orientation by specifying a different number of rows or 
columns for the table. Notice that the labeling of the scales (SUMMER and WINTER) 
at the bottom and right of the table margins makes the individual plots readable 
according to a simple rule. The labels and scale values are repeated so that there is no 
confusion about the meaning of a particular scale. Viewers always look to the top and 
left for category values and to the bottom and right for scale values. There are no blank 
scales at the margins of the plot. Indented tick marks are used to keep the scales from 
colliding (although you can change this feature by requesting TICK=FLUSH or 
TICK=FLOAT). 


The input is: 


USE US 
PLOT SUMMER*WINTER/GROUP-REGIONS , MULTI, FILL 


306 


Chapter 6 
The output is: 
REGIONS 
Great Lakes Ма Atlantic Mountain 
95 
90 
LII 
we 
m 
75? 
70 
65 
Pacific Plains 
95 
90 
856 
с 
w 
© 
75? 
70 
65 
0 10 20 30 40 50 60 70 
South Southeast METER 
95 
90 
85 
^p ato ok 
а 5 
75? 
F ю 
65 


O 10 20 30 40 50 60 70 O 10 20 30 40 50 60 70 
WINTER WINTER 


Example 24 
Formatting Multiplots 


The following example shows how to change the format of a one-way MULTIPLOT 
layout to a row-oriented pattern. By adding the ROW-4 option, we force SYSTAT to 
arrange the plots in two columns of 4 rows each. Notice that row labels are now used 
to denote the categories and the column labels are blank (because this is only a one- 
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way MULTIPLOT). The general rule of category/table labels to the top and left and scale 
labels to the bottom and right is maintained. 

The input is: 
USE US 
PLOT SUMMER*WINTER/GROUP=REGIONS, ROW=4, MULTI, FILL 


The output is: 


REGIONS 
Great Lakes Mid Atlantic 


Mountain New England 


Pacific Plains 


70 о 10 20 30 40 50 60 70 
о 10 20 30 40 50 0 22 
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Transposing Multiplots 


Here, we transpose the table from Example 24. We accomplish this by using COL-4 
instead of ROW-4. Because there are 8 levels of the REGIONS variable, we now get a 
regular table of 2 rows and 4 columns (as opposed to the default ragged-edge table for 
which you can see “Multiplots: One-way” on page 305) 


The input is: 


USE US 
PLOT SUMMER*WINTER/GROUP=REGIONS , COL=4,MULTI, FILL 


The output is: 
REGIONS 
Great Lakes Mid Atlantic Mountain 

95 

90 

850 
€ 

B0 

i: 

70 

65 

Pacific 


o ото ото 10 20 o 90 0070010 20 30 40.50 60 70 0 10 20 30 40 50 00 70 
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Example 25 
Multiplot Customization 


The following example shows how to add several options to customize a one-way table 
layout. We have created a columnwise table of scatterplots for the separate species in 
the Fisher Iris data set. By adding the STICK option, we cause the tick marks to extend 
outside the plot frames, more closely resembling the layout of Trellis plots. In addition, 
we add the Ell option to produce a confidence ellipse on the cases in the plot. We also 
modify the scale labels by adding XLAB and YLAB options. The tabular category labels 
are produced through the global LABEL command. 


The input is: 


USE IRIS 

LABEL SPECIES / 1='Setosa',2= 'Versicolor',3-'Virginica' 

PLOT SEPALLEN*SEPALWID / GROUP-SPECIES, MULTI, STICK, ELL, 
XLAB-'Sepal Width’ , YLAB='Sepal Length' 


The output is: 


SPECIES 
Setosa Versicolor Virginica 


a o зч 
w6ue jedes 


> 


2 3 4 5 2 3 2 3 
Sepal Width Seoal Width Seoal Width 


Example 26 
Line Chart Multiplot 


The next example shows a one-way MULTIPLOT layout of a LINE chart. The data are 
from Fisher (1935) and featured in Cleveland (1993). They are the yields of 10 
varieties of barley in two years (1931 and 1932) at 6 sites in the Midwestern US. We 
use only one year's data (1931) and plot against VARIETYS in this one-way example. 
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This row-plot demonstrates that the plots themselves may contain categorical 
variables, in this case VARIETY$. SYSTAT angles the labels when there is insufficient 
room to display them horizontally. 
The input is: 
USE BARLEY 
LINE Y1931*VARIETYS / GROUP-SITES,MULTI,YLAB-'1931 Yield' 
The output is: 
SITES 
Crookston Duluth Grand Rapids 
70 
60 
FES i 
“Ë 
AAY d M 3& 
2 
10 
Morris University Waseca 
70 
№ 
ИЕ 
405 
nA VIN wå 
20 
10 
AKO, Ae, A 
GOLD GLE А 
VARIET Y$. VARIETYS VARIETYS 
Example 27 


Multiplots and Axis Labels 


When we do not want angled labels, we can place them to the right side by changing 
our specification. The following example plots the 1931 yields against the sites. We 
make the specification SITE$*Y1931 in order to force the labels to the right margin. 
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The input is: 


USE BARLEY 
DOT SITE$*Y1931 / GROUP=VARIETY$,MULTI,FILL,SIZE=2, 
XLAB='1931 Yield' 


The output is: 
VARIETY S$ 
оп Manchuria No.457 No462 

Waseca 

University 

Morris. A 
3 

Grand Rapids ^ 

Duluth 

Crookston 

No475 Peatland Svansota Trebi 


40 20 30 40 50 60 70 10 20 30 40 50 60 70 
1931 Yield 1931 Yield 


Example 28 
Multiplots: Two-way 


This example shows a two-way MULTIPLOT layout. More extensive programming is 
used to reproduce the details of an example in Cleveland (1993). First of all, our data 
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set is not structured to produce a two-way analysis because the yield variable is spread 
over two columns, one for each year. This structure is a repeated measures layout, and 
is suitable for a multivariate analysis using GLM, but it is not suitable for using the 
columns as a grouping variable. Thus, we use the WRAP facility to stack the data set 
by year. WRAP creates a TRIAL variable to denote which column the data came from, 
so we create a new character variable YEAR$ from this TRIAL variable to denote the 
two years. 

To arrange the plot in the order Cleveland uses, we order the sites by the ORDER 
SITE$ command and the varieties by the ORDER VARIETY$ command. And to scale the 
aspect ratio of the Cleveland plot, we use SCALE-200,100. This makes each panel 
twice as wide as it is tall. After this plot, we reset SCALE to prevent further graphics 
from being rescaled. 


The input is: 


USE BARLEY 

WRAP Y1931,Y1932 

LET YEAR$-'1931' 

IF TRIAL-2 THEN LET YEAR$-'1932' 

ORDER SITES / SORT-'Waseca','Crookston','Morris' ,'University', 

'Duluth','Grand Rapids' 

ORDER VARIETYS$ / SORT-'Svansota','No.462','Manchuria', 'No.475', 
'Velvet','Peatland', 'Glabron','No.457', 
'Wisconsin','Trebi' 

SCALE 200,100 

PLOT VARIETYS*MEASURE / GROUP=SITE$, YEAR$, MULTI, XLAB='Yield', 

YGRID, SIZE=2,FILL 

SCALE 


313 


Multivariate Displays 


The output is: 
Trete 
Em. 
= E 
[o 
Rar 
Crookston oo L| 
es 
[ro 
= 1 
g E 
o TJ me 
= Е; 
Е" 
Tre 
| < 
ме E 
tae" 
=] Svansoe 
Trebi 
E 
3 
Е 
аз озо жох оба оја тю 
Example 29 
Multiplot Alternatives 


Sometimes we do not want to use the MULTIPLOT layout when creating multiplots. 
There can be several reasons for this. First, some options, like TRANSPOSE, and 
graphics, like DENSITY, are not available with MULTIPLOT. These omissions are due 
either to layout issues (as with TRANSPOSE) or to problems with equating vertical 
scales for different sample sizes (as with DENSITY). In these cases, it is better to omit 
the MULTIPLOT option and proceed as if it were present. The following example shows 
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how to do so for a scatterplot matrix (SPLOM). We create three separate scatterplot | 
matrices, one for each species in the Iris data set. The result is a multiplot of multiplots. 
The input is: 
USE IRIS 
LABEL SPECIES/1-'Setosa',2-'Versicolor',3-'Virginica' 
SPLOM SEPALWID, SEPALLEN, PETALWID, PETALLEN/GROUP-SPECIES 
[ 
The output is: 
а аасы 
Example 30 
Multiplots and 3D Graphs 


The MULTIPLOT option is not available for 3D. The reason is simple. The GROUP 
option already produces 3D multiplots on its own. The following example produces 3D 
histograms for the Iris data. We add ZMAX=15 to scale all the plots on the same COUNT 
scale. The spacing of the 3D plots is designed to allow rotation without collisions. If 
you examine 3D multiplots in the Graphics window with the Dynamic Explorer tool, 
you can rotate them in real time by pressing the rotation buttons. 
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The input is: 
USE IRIS 


LABEL SPECIES/1-'Setosa',2- 'Versicolor',3-'Virginica' 
DENSITY .*SEPALWID*SEPALLEN/GROUP-SPECIES, AX=CORNER, ZGRID, ZMAX=15 


The output is: 
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Function Plots 


Leland Wilkinson 


Function Plot produces two- and three-dimensional function plots. You type the 
equation to plot using SYSTAT operators and functions. The notation and syntax for 
equations follow standard BASIC. For example, the equation y — х? produces a 
parabola. 

SYSTAT analyzes your function, determines axis values, and draws it. Data values 
are not used. SYSTAT determines the plot domain (horizontal axis limits) and range 
(vertical axis limits) of the function by numerically computing the second derivatives 
(rate of change in y versus change in x) and indicators of periodicity and 
monotonicity. SYSTAT can generally handle discontinuity in the function as well. It 
should not blow up with large or small values, and it will clip values outside of the 
plot frame. If it fails, you can specify axis limits. 

Two variable equations have the form y — f(x) and are plotted in either a 
rectangular, polar, or spherical coordinate system. Three variable equations have the 
form z — f(x, y) and are plotted in either a rectangular, polar, spherical, or 
triangular coordinate system. In addition, for three variable equations, you can also 
choose contour or mosaic displays. 
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Function Plot Dialog Box 
To open the Function plot dialog box, from the menus choose: 


Graph 
Function Plot... 


T Graph: Function Plot 


[Layout | Legend J сою | Surface and Line Style | 
Coordinates | ХА: | YAxs | ZAxis_| х 


Function type: Function(s]: 


Mathematical м 


The following options are available: 


m Equation. Type your function using the Functions available in the respective 
Function type. 


m Display as contour. For three-variable plots, you can produce contour plots. 
SYSTAT automatically determines the number of contours to draw so that the 
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surface is delineated and the contour labels are round numbers. You can change 
this number by selecting Z-Axis and specifying a number of ticks (the number of 
tick marks on the vertical axis determines the number of contours to be drawn). 


m Display as mosaic. Generates a plot contoured with shading fill patterns in color 
gradations determined by the z height of the function at a given pixel. 


In addition, you can specify a different coordinate system and customize the axes, 
layout, and appearance of lines and surfaces. 


Using Commands 


To generate the following types of function displays: 
Two-variable FPLOT y = f(x) ; POLAR or SPHERE 


Three-variable FPLOT z = f(x, y) ; POLAR or SPHERE or TRI or 
CONTOUR or TILE 


Four-variable FPLOT w = f(x, y, z) ; TRI 


Examples 
Example 1 
Two-Variable Displays 
Rudnic and Gaspari (1987) use functions of the following form: 


y = xe 


to model the distribution of the radius of gyration of multidimensional random walks 
and related fractals. To plot this function, 


The input is: 
FPLOT p = SOR(r)*EXP(-r) ; XMIN-0 XMAX-10 ҮМІМ=0 YMAX-.5 


Notice that a semicolon (;) is used before the option list instead of a slash (/). In this 
context, SYSTAT interprets a slash as a division sign. 
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The output is: 


0.5, 


.0 
ba 


Try some of these other functions: 


KKKK KKK 
How w nm gu ww 


Example 2 


SIN(COS (TAN (x) ) ) 

1 / SIN(x) 

SIN (EXP (x) ) 
SIN(1/x) 
ATH(COS (x) ) 

1 / x^2 - SIN(x) 

1 / LOG(ABS (TAN (x))) 


Statistical Distributions 


Function Plot can produce a variety of statistical distribution plots. SYSTAT’s built-in 
functions include: 


Distribution 
Uniform(0, 1) 
Normal (0,1) 
t 
F 
Chi-Square 
Gamma 
Beta 
Exponential 
(0,1) 
Logistic (0,1) 


Cumulative Density 

UCF (x, low, hi) UDF (x, low, hi) 
ZCF(z,loc,sc) ZDF (z, loc, sc) 
TCF (t, df) TDF (t, df) 

FCF (Е, dfl, df2) FDF (Е, dfl, df2) 
XCF (x? df) XDF (x?, df) 


GCF (x, shp, sc) GDF (x, shp, sc) 

BCF(x, shp1, shp2 )BDF (x, shp1, shp2) 
ECF (x, loc, sc) 
LCF (x, loc, sc) 


EDF (x, loc, sc) 
LDF (x, loc, sc) 


Inverse Random data 
ШЕ (О, low, hi) URN (low, hi) 
ZIF (О, loc, sc) ZRN (loc, sc) 
TIF (О, df) TRN (df) 

FIF (©, dfl, df2) FRN (dfl, df2) 
XIF (©, df) XRN (df) 

GIF (О, shp, sc) ^ GRN (shp, sc) 


1, 
) 


ERN (loc, sc) 
LRN (loc, sc) 


BIF (0, shp1, shp2) BRN хел 


EIF (О, loc, sc) 
LIF (О, loc, sc) 


323 


Studentized SCF (х, К, df) 
Weibull WCF (x ,sc, shp) 
Cauchy CCF(x ос, sc) 
Double ех 
р PO- DECF(x, loc, sc) 
Gompertz GOCF(x ,b, c) 
Gumbel GUCF(x,loc,sc) 
Inverse Gauss- 
ian(Wald) IGCF (x, loc, sc) 
Logit normal | ENCF(x, loc, sc) 
Lognormal ^ LNCF(x, loc, sc) 
Pareto PACF(x, thr, shp) 
Rayleigh RCF(x, sc) 
Triangular TRCF(x, a, b, c) 
ae LOCF(x, logsc, 
Loglogisite shp) 
Erlang ERCF(x, shp, sc) 
Non-central 
chi-square  NXCF(x, df, 5 


Non-central Е МОРЕ, anyam 


Non-centralt NTCF(x, df, 5) 


Smallest 

extreme value SECF(x, loc, sc) 
Studentized 

maximum SMCF(x, k, df) 
modulus 

Binomial NCF (x, n, p) 
Poisson PCF (x, À) 
ri uni- DUCF(x, N) 
orm 

Geometric GECF(x, р) 


ЗЕ HCF(x, М, m, n) 


Negative bino- 

mial NBCF(x,k,p) 
Benford’s Law BLCF(x, B) 
Logarithmic — | SCF (x, theta) 


series 
Zipf ZICF(x, shp) 


SDF (x, k, df) 
WOF (x, sc, shp) 
СОЕ(х, loc, sc) 


DEDF(x, loc, sc) 
GODF(x, b, c) 

GUDF(x, loc, sc) 
IGDF (x, loc, sc) 


ENDF (x, loc, hi) 
LNDF(x, loc, sc) 
PADF(x, thr, shp) 
RDF(x, sc) 
TRDF(x, a, b, c) 


LODF(x, logsc, shp) LOIF (04 logsc, shp) 
ERIF (O, shp, sc) 


ERDF(x, shp, sc) 
NXDF (x, df, б) 


NFDF(x, dfl, df2, Ó)NFIF (О, dfl, df2, б) 


NTDF(x, df, б) 
SEDF(x.loc, sc) 


SMDF(x, К, df) 
NDF (x, n, p) 
PDF (x, A) 
DUDF(x, №) 
GEDF(x, p) 
HDF(x, N, m, n) 


NBDF(x, k, p) 
BLDF(x, B) 
LSDF(x, theta) 
ZIDF (x, shp) 


$!Е(О, k, df) 
WIF (О, sc, shp) 
С1Е(О, loc, sc) 


DEIF(O, loc, sc) 


GOIF(Q, b, c) 
GUIF(, loc, sc) 


IGIF (О, loc, sc) 


ENIF (О, loc, sc) 


LNIF (0 loc, sc) 


PAIF (О, thr, shp) 


RIF (О, sc) 
TRIF (О, a, b, c) 


NXIF (О, df, б) 


NTIF (О, df, d) 
SEIF (О, loc, sc) 


SMIF(0. k, df) 
МЕ (04 n, p) 
PIF (О, А) 
DUIF(O, N) 
GEIF(O, p) 
HIF(, N, m, n) 
NBIF(O, К, p) 
BLIF(O. B) 
LSIF(O, theta) 
ZIIF (О, shp) 
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SRN (К, df) 
WRN (sc, shp) 
CRN(loc, sc) 


DERN(loc, sc) 


GORN(b, c) 
GURN(loc, sc) 


IGRN (loc, sc) 


ENRN (loc, sc) 
LNRN(loc, sc) 
PARN(thr, shp) 
RRN(sc) 
TRRN(a,b,c) 
LORN(logsc, 
shp) 
ERRN(shp, sc) 
NXRN(df, б) 


d m df2, 
) 


NTRN(df, d) 
SERN (loc, sc) 


SMRN(k, df) 
NRN (n, p) 
PRN (A) 
DURN(N) 
GERN(p) 
HRN(N, m, n) 


NBRN(k, p) 
BLRN(B) 
LSRN(theta) 
ZIRN(shp) 
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where low is the smallest value and hi, the largest value ; loc is the location parameter 
and sc, the scale parameter; shp is the shape parameter and thr, the threshold parameter; 
and df is the degrees of freedom. If low, hi, loc, or sc is omitted, default values, which 
are displayed in the Distribution column, are assumed. 


To plot any of these distributions, make the parameter(s) in parentheses for the 
independent variable(s). 


The input is: 
FPLOT y 


TDF(t,10) ; XLABEL-'t' YLABEL-'Density' XMIN--3, 
XMAX-3 
FPLOT y = DEDF(x,0,1); XMIN=-6.5,XMAX=6.5 , 
XLABEL-'Double Exponential', 
YLABEL-'Density', YMAX-0.525 
FPLOT y=XDF(x,5); XMAX-20 YMAX-0.16 YLABEL-'Density', 
XLABEL-'Chi-square' 
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The output is: 


-1 1 3 55 -39 -13 13 39 65 
t Double Exponential 


Example 3 
Polar Function Plot 


Polar coordinates translate Cartesian coordinates into a circular arrangement. Each 
point is given by its distance (7) from the origin and the angle (0) of the x axis and the 
vector from the origin to the point. Polar coordinates may seem confusing in some 
contexts, but they are handy for representing many mathematical equations and data 
profiles. 

SYSTAT automatically scales polar graphs as well as rectangular graphs. To alter 
the scale by setting minimum and maximum values, use the x axis for the 0 axis (angle) 
and the y axis for the r axis (distance). By default, the minimum of the y axis (ғ) is at 
the center of the circle, and the maximum of the y axis is at its periphery. When doing 
a polar plot with three variables, use the z axis to control z height, as with Cartesian 
coordinates. 
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If the data collide with the axes, you can change the axes displayed. If you specify 
AXES - BOTTOM, you get only the circle, omitting the radial axis. If you specify 
AXES = LEFT, you get only the radial axis. The default specification displays both the 
r and Ө axes. If you specify AXES = NONE, then no axes are drawn. In addition, you 
can specify which scales to display using SCALE = BOTTOM, SCALE = LEFT. ‚ог 
SCALE = NONE. 


For example, to create a two-dimensional flower with eight petals, 
The input is: 
FPLOT y = SIN(8*x) ; POLAR AXES=NONE SCALE=NONE, 


XMIN=-0.017 XMAX=6.28, 
YMAX-1 YMIN=-1 XLABEL=' ' YLABEL=' ' 


The output is: 


You can alter the number of petals by changing 8 to some other number. One revolution 
of the circle is approximately 6.28 (2n) radians. 


Example 4 
Spherical Function Plot 


Spherical coordinates work like polar coordinates, with an additional angle for 
elevation. Using a spherical coordinate system for a two-variable function draws the 
equation on the surface of a unit sphere. 


The following example illustrates a three-variable spherical function. 
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The input is: 


BEGIN 
FPLOT 2 = SIN(x) + COS(y) ; SPHERE LOC--2IN, OIN 
FPLOT z = SIN(x) + COS(y) ; SPHERE SURFACE-XYCUT LOC-2IN,OIN 
END 
The output is: 


After they were initially displayed, these plots were rotated using the Dynamic 
Explorer. 


Example 5 
Three-Variable Displays 


SYSTAT produces three-dimensional function plots when you have two predictors in 
the equation. 


The input is: 


FPLOT 2 = x^2 - y^2 
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The output is: 


The program recognizes the two predictors x and y automatically and places the plot in 
three-dimensional perspective. 


Picking the number of cuts in a grid. For contour plotting in two dimensions, SYSTAT 
computes a 25 x 25 grid, and for surface plotting in three dimensions, a 30 x 30 grid. 
These settings are usually sufficient to give good resolution while conserving 
computer time. If desired, you can change the value for grid cuts. For example, to 
choose 40 cuts, type CUT=40. You may choose up to 60 cuts and as few as 2 on most 
machines, but you should rarely need fewer than 10 or more than 40. 

Setting the number of grid cuts to 10 is the best way to save computer time when 
you are trying to get a rough sketch of a surface. If you want a more detailed view, set 
the number of grid cuts to 40. You may need an even larger value for some complex 
mathematical functions with steep cliffs. 


The following function: 
"e 2sin( 24x? y) 
Хх +» 
describes a damped three-dimensional sine wave. To plot this function, 


The input is: 


FPLOT 2 = 2*SIN(2*SQR(x^2 + y^2)) / SQR(x^2 + y^2); XMIN--5 XMAX-5, 
YMIN--5  YMAX-5 
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Try some of these other functions: 


Function Plots 


Function Axis limits 
Xmin 7 —3 Xmax = 3 
z=x/exp(x * x) * y/exply * у) Ymin = -3 Ymax=3 
Zmin = -5 Zmax=5 
Xmin = —10 Xmax = 10 
z-x*y * x*x- y*y)/(x*x + y* y)¥min=—10 Ymax = 10 
Zmin = —100 Zmax = 100 
Xmin = -5 Xmax = 5 
z=sin (sqr (x *x + y * y) Ymin = —5 Ymax = 5 
Zmin = —1 Zmax = 1 
Xmin = —10 Xmax = 10 
z=x*x/8-y*y/12 Ymin = —10 Ymax = 10 
Zmin = —10 Zmax = 10 


Example 6 
Chi-Square Distribution 


We demonstrate a plot of the chi-square distribution (the chi-square on the x axis, the 
degrees of freedom on the y axis, and the density on the z axis). 
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The input is: 
FPLOT p - XDF(chisq,df); CUT-40 XMIN-0 XMAX-20 YMIN-0 YMAX-10, 
ZMIN-0 ZMAX=.5  SURFACE-YCUT, 
XLAB-'Chi-square', 
YLAB-'Degrees of Freedom', 
ZLAB-'Density' 
The output is: 
If you want to view the display from another perspective, try rotating it dynamically 
on the screen using the Dynamic Explorer tool. Moreover, you can reverse the scale on 
the x axis or y axis. 
Example 7 
Contour Plots 


To produce contour plots for three-variable plots, SYSTAT first computes its own 
square grid of interpolated or directly estimated values. From the grid, contours are 
followed using the method of Lodwick and Whittle (1970) combined with linear 
interpolation. This method is guaranteed to find proper contours if the grid is fine 
enough. 

SYSTAT automatically determines the number of contours to draw so that the 
surface is delineated and the contour labels are round numbers. You can modify this 
number by specifying the number of ticks on the vertical (z) axis. In addition, plots can 
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contain contour lines (CONTOUR) or plots with contoured shading (TILE): fill patterns 
in color gradations determined by the z height of the function at a given pixel. 


Here, we present three contour plots for the function z = ѕіп(х)*соѕ(у). 


The input is: 
BEGIN 
FPLOT z = SIN(x)*COS(y) ; CONTOUR XMAX-4 XMIN--4 ҮМАХ=4, 
YMIN--4 LOC--3IN,1IN 
FPLOT z = SIN(x)*COS(y) ; CONTOUR ZTICK-10 XMAX-4 XMIN--4, 
YMAX-4 YMIN=-4 LOC=3IN,1IN 
FPLOT z = SIN(x)*COS(y) ; TILE XMAX-4 XMIN--4 YMAX=4, 
YMIN--4 LOC=0IN,-5IN 
END 
The output is: 


Mosaic plots look the best on color monitors. 
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Contouring with Triangular Coordinates 


You can contour using triangular coordinates. Triangular plots are often used to 
analyze mixture experiments and examine four-dimensional data in two dimensions. 
Diamond (2001) summarizes their use and references more advanced material. 


The input is: 


BEGIN 
FPLOT u у + w^2 + x^3 - 2.5*v*w*x ; TRI CONTOUR ZMAX=.8, 
LOC--3.5IN,O0IN 
FPLOT u = v + w^2 + x^3 - 2.5*v*w*x ; TRI TILE ZMAX=.8, 
LOC-3.5IN,0IN 


IN 
ОА 


фо o2 04 96 08 10 39^ 


Example 8 
Three-Dimensional Functions and Contours 


In this example, we place contours below a 3-D function plot. Use EYE to adjust the 
perspective of the plot for best viewing. For the contour plot, we use CUT=50 to make 
the contours smoother and ZTICK to draw more than a few contours, For the surface, 
we make the mesh finer using CUT=60. 
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The input is: 


EYE -6,-8,5 

BEGIN 

FACET XY 

FPLOT Z-EXP(-X^2)*EXP(-Y^2)*X ; SCALE-0 CONTOUR CUT-50, 

ZTICK-21, 
XMIN--3 XMAX-3 YMIN--3, 
YMAX-3 

FACET 

FPLOT Z-EXP(-X^2)*EXP(-Y^2)*X ; CUT-60 ZTICK=5, 

ZMIN--.5 ZMAX-.5, 
XMIN--3 XMAX-3, 
YMIN--3 YMAX-3 

END 

The output is: 

05 

03 

0.1 

N 
41 
-0.3 


Example 9 
Highlighting Surfaces 


SYSTAT offers several surfaces for 
five plots show the same function usin; 


three-dimensional function plots. The following 
g different surfaces for the display. 
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The input is: 


BEGIN 
SCALE 50 50 
FPLOT 2 = EXP(-x^2 - y^2 + x*y) ; SURFACE-XYCUT TITLE-"XYcut", 
LOC--4IN, 6.5IN 
FPLOT 2 = EXP(-x^2 - y^2 + x*y) ; SURFACE-XCUT XMIN--2.5, 
XMAX-2.5  YMIN--2.5, 
YMAX-2.5 2МІМ=0 ZMAX=1, 
ZPIP-10 CUT-40, 
TITLE="Xcut" LOC-4IN,6.5IN 
FPLOT 2 = EXP(-x^2 - y^2 + x*y) ; SURFACE-YCUT XMIN=-2.5, 
XMAX-2.5  YMIN--2.5, 
YMAX-2.5 ZMIN-0 2МАХ-1, 
ZPIP-10,CUT-40, 
TITLE-"Ycut"  LOC--4IN,0IN 
EXP(-x^2 - y^2 + x*y) ; SURFACE-ZCUT XMIN=-2.5, 
XMAX-2.5  YMIN--2.5, 
YMAX-2.5 2МІМ=0 ZMAX-1, 
ZPIP-10,CUT-60, 
TITLE-"Zcut" LOC=4IN, OIN 
EXP(-x^2 - y^2 + x*y) ; SURFACE-COLOR XMIN--2.5, 
XMAX-2.5 YMIN--2.5 YMAX-2.5, 
ZMIN-0 2МАХ=1 ZPIP-10, 
TITLE-"Color"  LOC-0IN, -6IN 


FPLOT z 


FPLOT 


N 
" 


END 
SCALE 
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Maps 


Map produces maps in Oblique Gnomonic, Oblique Stereographic, Mercator 
Conformal, Oblique Orthographic, Lambert Equal Area Cylindrical, Miller 
Cylindrical, Robinson, Fisheye, and Peters projections. To create these projections, 
you can choose from among the many map files supplied with SYSTAT (on your CD- 
ROM), or you can create your own тар files. Options are available for filling map 
polygons with colors, shading, or patterns to indicate the values of a variable (for 
example, average income within each country). It is also possible to include the value 
of a variable within a polygon by using contours or icons. 

If you plan to work with maps, you might want to read some of the research on map 
coloring, shading, and contouring. Many books on these topics exist, of course, in 
geography. Gale and Halperin(1982) discusses continuously shaded maps, which you 
can create in SYSTAT with the Fill option. Trumbo (1 981) and Wainer and Francolini 
(1980) discuss problems in coloring statistical maps. 
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Sample Displays 


U.S. map shaded by number of tornados 


-85 


Mercator projection of the world 


-180 -90 


-180 -90 


Stereographic projection of the U.S. 


-380 
85 = 


0 


Peters projection of the world 


90 180 
-j85 

51 

17 

17 

-51 

—_1.85 

90 180 
90 180 
85 


Map Dialog Box 


То open the Map dialog box, from the menus choose: 


Graph 
Map... 


T de atii 


[ m [ саша |. .meSye | 
Main xXAs | ҮА [| AllAxes fi Layout. f Legend 


Create a map: 


Coordinate system 
© Projection: 


Mercator conformal м 


© Spherical 
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The following options are available: 
Projection. Select one of the following types of geographic projections: 
Fisheye Lambert Cylindrical Mercator Conformal 
Oblique Gnomonic Oblique Orthographic Oblique Stereographic 
Peters Robinson Sinusoidal 
Miller Cylindrical 
Spherical. Plots the map on a globe. 
You can select cases from the data file to map only part of your file. For instance, you 
could select only the Midwest states in USSTATES. 
In addition, you can customize the layout, axes, and appearance of the maps. 
Map Files 


SYSTAT plots maps using two files at the same time: 


= The first file contains the coordinates of the boundaries of each polygon (state, 
precinct, and so on) and has an .SMP extension. 


ш The second file contains data about each polygon and is a regular SYSTAT data 
file with an .SYZ extension. 


The .SMP File with Coordinates 


Here are two neighboring zip code areas (one square-shaped, the other a hexagon): 
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Zone 1 
1 
4 
0 Jes. 
> f є 
ја N 
4 < » 
Zone 2 ZONE 
3 2 
-2 A goa 4 2 à 


The coordinates of the zones are: 


Zonel (0,0) (01) (1) (L0) 
Zone2 (00) (1,0) (2-1) (52) (0-2 (1—1) 


These coordinates are expressed as data in the .SMP file as: 


ID NP ХІ ҮІ X2 Y2 X3 ҮЗ X4 Y4 X5 YS X6 Y6 X7 Y7 


1 x d o dead О "70, 
PE A о а Оо Е О ТАНОО 


ID is an identification number for each polygon. NP is the number of points in the 
polygon (x-y pairs). The longitude and latitude of each point on the boundary of the 
polygon is x and y, respectively. Note that the last x-y pair of a boundary matches its first 
pair. This is necessary so that the boundaries will be closed. Hence, if a closed boundary 
has л sides, NP = n+ 1. SYSTAT plots a “map” for any arbitrary polygon with x-y 
coordinates in any units, but most geographical data files describe longitude and latitude 
in decimal degrees. The records in this file can have unequal lengths (up to 999 
characters), and the data can be integer, real, or exponential. If you are uncertain about a 
format, the simplest is one with two numbers per record, separated by a blank or comma: 


ID NP 
X Y 
X Y 
X Y 
ID NP 


342 
Chapter 8 


and so on. 


Polygon Identification 


The .SMP file should be sorted in ascending order of absolute values of the ZD value 
(the early /D values are smaller than later values). Several polygons may have the same 
ID. The Massachusetts map in the USSTATES.MAP file, for example, has three 
polygons: One for the mainland, one for Nantucket, and one for Martha's Vineyard. A 
separate header record (/D, NP) appears in the .MAP file for each ofthe three polygons. 
ID's need not be sequential numbers (you can skip a number), but they must be 
integers; otherwise, they will be truncated to integers (decimals will be discarded). 

If an JD is negative, then its associated polygon is not filled with a color or a fill 
pattern. This is a useful device for representing rivers or roads within a state or 
polygon. A Massachusetts map, for example, could have the following ZD 's: 4 
(mainland), 4 (Martha's Vineyard), 4 (Nantucket), 4 (Cape Cod), -4 (Mass Turnpike), 
-4 (Concord River), —4 (Route 128). АП of the points following these /D's would be 
plotted as Massachusetts. 


The .SYZ File with Data 


The values of the variables for the zip code zones in the .SYZ data file are: 


MAPNUM MINLAT MAXLAT MINLON MAXLON LABLAT LABLON NAMES 


1 0 1 0 1 2 ^ Zone 1 
2 -2 0 -1 2 -1.0 5 Zone 2 


where, for each polygon: 


МАРМОМ ID number 

MINLAT Minimum latitude 
MAXLAT Maximum latitude 
MINLON Minimum longitude 
MAXLON Maximum longitude 
LABLAT Latitude of label (optional) 
LABLON Longitude of label (optional) 
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The .SYZ file must contain all these variables (in any order) and must be sorted by 
MAPNUM. If any of these variables is missing, Map does not work. The data file can 
contain additional variables on each record. The optional LABLAT and LABLON 
columns are used to position a map label in terms of longitude and latitude (that is, 
stored in NAMES). 


Filenames 


Islands 


The .SYZ data file can have any name. The .SMP file must have that same name, plus 
the four characters .SMP (period and extension) appended immediately after the last 
character of the data file’s name. Here are valid filename pairs: 


MYDATA.SYZ MYDATA.SMP 
MAPSTUFE.SYZ МАРЅТОЕЕ.5МР 


The name of the .SMP file must match exactly the corresponding SYSTAT data file, 
and the files should be stored in the same directory. 


Islands are represented as part of their surrounding polygon. The following figure 
shows an example of an island surrounded by an irregular polygon. If you trace the 
numbers, you can see that the filled area will exclude the triangular island in the 
center. The path from 6 to 7 and 10 to 11 is called a zero-width corridor. To 
represent this polygon and its island, you should enter 11 x-y coordinates into the 
map file. The corridor will show on maps produced with SYSTAT unless you use Fill 
to conceal it; otherwise, you should represent the island and the outer polygon as two 


separate polygons. 
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How SYSTAT Plots a Map 


To plot a map, SYSTAT selects an /D for a polygon by reading the MAPNUM variable 
in the first record of the data file. Next, the .SMP file is read until a matching ZD 
variable absolute value is found. The subsequent NP points of the polygon 
corresponding to that /D are plotted and, possibly, filled with color and fill pattern and 
labeled. Next, the .SMP file is read further to check for remaining /D's that match the 
current MAPNUM. Any polygons with /D's corresponding in absolute value to the 
current value of MAPNUM are similarly plotted. If an ZD is negative and its absolute 
value corresponds to the current value of MAPNUM, its associated polygon is plotted 
but not filled. When no further /D's corresponding to MAPNUM’s value are found, the 
next appropriate record in the data file is selected. Polygon(s) for this ZD are plotted, 
and the process continues until no further appropriate records are available in the data 
file. The MINLAT, MAXLAT, MINLON, and MAXLON variables are used for scaling 
maps in the viewing window, and the LABLAT and LABLON variables are used for 
labeling polygons. 


Creating Map Files 


To create a map file: 


m Create a text file ofthe area's boundaries in “connect-the-dots” order (for example, 
the coordinates for the .SMP file). 
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m With commands, specify: 


IMPORT MYMAP.DAT / TYPE-MAP 


The above command automatically saves the corresponding .SMP file in the same 
location. 


" Make a .SYZ file of the minimum and maximum latitude and longitude for each 
region. 
To draw a map, you must read the .SYZ file; then SYSTAT knows to find the .SMP 
file. 


Drawing Other Shapes 


You may have figured out that the .SMP file is a way to plot any filled or empty shape 
in SYSTAT. You can take a digitizer tablet, for example, and trace letters, figures, and 
symbols from published material or laboratory data. Most CAD programs convert 
these digitized tracings into x, y coordinate ASCII files. If you import these files into a 
.SMP file, you can plot irregular shapes with SYSTAT. 


Sources of Map Files 


Standard latitude and longitude text files of city, county, state, and international maps 
that SYSTAT can process are available. 551. includes U.S. map boundary files 
developed from the Census Bureau's TIGER geographic database in SYSTAT's 
.SMP/.SYZ formats on the CD-ROM. Census tract, block group, incorporated place 
(city, town, or village), and minor civil subdivision (township) boundaries are available 
on either a statewide or individual county basis. 

Included with the boundary files is an extensive extract from the Census Bureau's 
Summary Tape 3A for the requested area and geography. This extract includes over 
200 variables profiling each area's social, economic, and housing characteristics as 
reported in the 1990 census, including household type and size, ethnic composition, 
level of education completed, labor force participation, housing costs, and extensive 
income and poverty indicators. You may also request other specific data series, if 
available. SYSTAT does not provide geo-coding (the ability to locate a point based on 
an address) at this time. Therefore, to map the data that you have collected, you need 
to know either the area (for example, the census tract) within which each of your data 
points falls or the longitude/latitude coordinates of the area you want. 


346 


Chapter 8 


In addition, you can import Arcview files. SYSTAT saves imported Arcview files 
as .SYZ and .SMP files. 


Examples 


Example 1 
Map of the United States 


Here is an example of an unlabeled map of the contiguous United States drawn using 
the default Mercator Conformal projection. 


The input is: 
USE USSTATES 


The output is: 


Shading 


In this example, the degree of shading for each state is determined by the value of the 
variable TORNADOS. Florida and Oklahoma have the highest incidence of tornados. 
The input is: 


USE USSTATES 
MAP / FILL-TORNADOS 
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The output is: 


m 35 


B 25 
ш 20 
m 15 


Example 2 
Map of a Selected Region 


You can select certain cases to map only part of your file, just as you can make graphs 
using subsets of your data. For example, you can select only the Midwest states. 


The input is: 
USE USSTATES 


SELECT REGIONS-'Midwest' 
MAP 


The output is: 
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Adding Symbols and Text 
We can use DRAW and WRITE to mark the location of Chicago with an arrow. 
The input is: 
BEGIN 
USE USSTATES 
SELECT CASE < 49 AND REGION$-'Midwest' 
MAP 
DRAW ARROW / FROM-4IN,3.75IN TO-3.15IN,1.85IN 
WRITE "Chicago" / LOC-3.75IN,4IN 
END 
The output is: 
Chicago 
Example 3 


Geographic Projections 


Projections are transformations of spherical coordinates to rectangular coordinates. 
You can think of them as mathematical methods for taking an orange peel and 
stretching it to lie flat on a table. Map offers 10 projections, listed under Projection. 
Below are maps of the United States using eight different projections. We add dotted 
lines that automatically mark the longitude and latitude at 10? increments using XGRID 


and YGRID. 
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The input is: 


USE USSTATES 
BEGIN 


MAP 


MAP 


MAP 


/ PROJ-GNOMON SC=BOX AXES-BOX  XGRID YGRID, 
XLAB-' ' YLAB = ' ' TITLE-"Gnomonic", 
LOC--4IN,15IN 

/ PROJ-STEREO SC-BOX AXES-BOX XGRID YGRID, 
XLAB-' ' YLAB - ' ' TITLE-"Stereo", 

LOC=4IN, 15IN 

/ PROJ=MERC SC=BOX AXES=BOX XGRID YGRID, 
XLAB: ' YLAB = ' ' TITLE-"Mercator", 
LOC--4IN, 10IN 

/ PROJ-ORTHO SC-BOX AXES-BOX XGRID YGRID, 
XLAB ' YLAB = ' ' TITLE-"Ortho" LOC-AIN,10IN 

/ PROJ-LAMBERT SC=BOX AXES-BOX XGRID YGRID, 
XLAB-' ' YLAB=' ', 

TITLE="Lambert Equal Area Cylindrical", 
LOC=-4IN, 5IN 
/ PROJ=ROBINSON SC=BOX AXES=BOX XGRID YGRID, 
=' ' YLAB = ' ' TITLE-"Robinson", 
LOC-AIN,5IN 

/ PROJ-MILLER  SC-BOX AXES-BOX XGRID YGRID, 
XLAB: ‘ YLAB = ' ' TITLE-"Miller", 
LOC--4IN, QIN 

/ PROJ-SINUSOIDAL SC-BOX AXES-BOX XGRID YGRID, 
XLAB-' ' АВ = ' ' TITLE-"Sinusoidal", 

LOC-AIN, OIN 
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sb dd poo mh 
"i30 -120 -M0 -100 -90 -80 -70 


Lambert Equal Area Cylindrical 
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World Map Projections 
We now show a map of the world using the Peters, Miller Cylindrical, Mercator 
Conformal, and Lambert Cylindrical projections. 
The input is: 
USE WORLD 
BEGIN 
MAP / PROJECT=PETERS SC=BOX AXES=BOX XGRID, 
YGRID XLAB=' ' YLAB = ' ' TITLE="Peters" LOC--4IN,8IN 
MAP / PROJECT-MILLER SC=BOX AXES=BOX ХСВІР, 
YGRID XLAB-' ' YLAB = ' ' TITLE-'Miller" LOC-AIN,8IN 
MAP / PROJECT-MERC SC=BOX AXES=BOX XGRID, 
YGRID XLAB-' ' YLAB = ' ' TITLE-"Mercator" LOC--AIN, OIN 


MAP / PROJECT-LAMBERT SC-BOX AXES-BOX  XGRID, 
YGRID XLAB-' ' YLAB = ' ' TITLE="Lambert" LOC-5IN, OIN 


END 


The output is: 
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Example 4 
Maps and Plots 


You can use BEGIN...END to plot data on top of maps. If you specify two graph 
commands between BEGIN...END, the plots print on top of each other. 

In this example, we plot the rainfall for each of the 48 contiguous states in the U.S. 
We standardize the rainfall to a (0,1) scale and let the symbol size reflect the result. 


The input is: 


USE USSTATES 
SELECT CASE « 49 
STAND RAIN /RANGE 
BEGIN 
MAP / PROJ=STEREO 
PLOT LABLAT * LABLON / SIZE-RAIN SYMBOL=1 , 
PROJ-STEREO  AXES-NONE, 
SCALE-NONE FILL=1 
END 


The output is: 


о о о о о О О 
5 


Plotting Icons on a Map 


Because icons are graphic representations of data, it can be useful to plot them on top 
of maps in order to identify trends across geographic areas. Use the Icon Plot dialog 
box to plot icons. 


The input is: 


BEGIN 
USE USSTATES 
SELECT CASE « 49 AND REGIONS - ‘Midwest’ 
MAP / PROJECT=STEREO 
ICON MATH VERBAL / THERM PROJECT=STEREO, 
йй ILOC-LABLON,LABLAT SIZE=.5 


The output is: 


Drawing a 3-D Smooth Over a Map 


Although they can distort, perspective maps can be useful. An important step in 
producing them with SYSTAT is to line up the scales for the plot over the map with the 
scales for the map itself. To do this with the United States map, set FACET-XY to place 
the map on the floor. FACET specifies the plane onto which subsequent 2-D graphs are 
plotted. 


Here, we use a stereographic projection for the map and plots. To avoid doubling the 
axes and scales, set AXES to None and SCALE to None for the map. 
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The input is: 
USE USSTATES 
SELECT CASE « 49 
BEGIN 
FACET XY 
MAP / PROJECT-STEREO  AXES-NONE SCALE=NONE 
PLOT RAIN*LABLAT*LABLON / SMOOTH-INVERSE, 
PROJECT-STEREO CUT=40, 
TENSION-.15  AXES-SPOON, 
SCALE-L 
FACET 
END 
The output is: 
z 
Example 5 
Global Maps 


You can draw a map on a globe and change the viewpoint. The first plot created in this 
example is a map of the Western Hemisphere. To plot this map, make sure both 
WORLD.SYS and WORLD.MAP are in the same directory. We then rotate this 
representation to make Africa visible. 
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The input is: 
USE WORLD 


BEGIN 
MAP / SPHERE LOC - -2.2,0 
EYE 18 20 16 / SPHERE 

MAP / SPHERE LOC - 2.2,0 
EYE 


END 


The output is: 


Example 6 
Latitude Longitude 
in a map as it is usually found in an atlas. 


You can draw the latitude and the longitude 


The input is: 
USE WORLD 

MAP /AXES- BOX SCALE - c YLIM- -23.5, 23.5 + 
PROJECT - ROBINSON XTICK - 18 YTICK - 10 XGRID YGRID 
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The output is: 
» _180-160-140-120-100 -80 60 Longitude 60 80 100 120 140 160 180 
85 e = 
3. 41 
3 
= 0 
ат 
-34 
-180-160-140-120-100 -80 -80 40 -20 0 20 40 60 80 100 120 140 160 180 
Longitude 
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Options and Features for Graphs 


Leland Wilkinson 
(revised by B. Rajakumar and S. Anoopama) 


SYSTAT provides several options for fine tuning and customizing graphical displays. 
For example, you can select plot symbols, change the width and/or height of the 
display, limit the range of plot axes, log or power transform the data, add standard 
error bars, use extra-thick lines, add a title, and more. 

Local options affect only the display for which they are requested. For example, if 
you specify orange symbols for a scatterplot and next request a bar chart, only the 
scatterplot has orange symbols. The options related to a particular display are made 
available in the dialog boxes. АП options available in the graph's dialog box are local 
to that particular display. Command users simply add the word for the option after the 
slash. 

Global options affect the appearance of every display until you change them. For 
example, if you request double-size characters, the character size is used for your first 
scatterplot and subsequent bar graphs, histograms, SPLOMs, and so on. These 
features are found on the Graph tab of the Global Options dialog box. 

You can also use the tools in the Graph Editor tab to edit, save, print, and 
dynamically rotate currently displayed graphs. 


Local Options 


Local options are specific to a single graph or plot. When you create a new graph, you 
can see which local options are available for your particular graph by clicking the 
Options, X-Axis, Y-Axis, Z-Axis, W-Axis, All Axes, Layout, Color, Fill, Symbol and 
Label, Surface and Line Style tab in the graph's dialog box. Experiment with these 
settings to achieve the desired look for your graph. 
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Enhancing Graphical Displays 


In addition to the Options available in the graph's dialog box, SYSTAT offers many 
other options to enhance your graphical displays. These include error bars, coordinates, 
plots of residuals, and smoothing options that are available in various tabs of the 
graph's dialog box. The options described below are available for several different 
types of graphs. Detailed information about smoothers is available in Chapter 5. 

Most types of graphs have unique options, each of which is described in the chapter 
covering the graph type. 
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Error Bars 


For bar, dot, pyramid, profile and line displays, SYSTAT provides conventional error 
bars to indicate the variability of the measure displayed. As an alternative to the usual 
vertical line bounded by horizontal ticks, SYSTAT also offers a box. Use the Error Bars 


tab to specify these and some additional options. 


M Graph:Bar Chart 


[ Aes | Legend [0 
| Man | Options | ErorBas | Coordinates | ХАЖ 


Type 


© Standard error 
© Standard deviation р. [0.6827 | 


©) Value of variable: COUNTRY 


idth of error ba 


Style 


*› Line with tick 
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Standard error. For standard error bars, specify a number between 0 and 1 for the 
confidence level (p). For one standard error of the mean (the default), specify 0.6827; 
two standard errors, 0.9545, etc. 


Standard deviation. Error bars in standard deviation units. Specify a number between 
0 and 1 for the confidence level (p). 


Value of variable. Error variable. A variable in your data file that represents the errors. 


Set direction (+/-) by. Variable specifying direction of error bar. Values of this variable 
control direction (0=попе; positive values, up; negative values, down). 


Width of error bars. Length of horizontal ticks that bound the error bar or width of the 
error box. Specify a number between 0 and 1, where for 1.0, the tick (or box) is as wide 
as a bar in a bar chart; 0.5, the length (or width) is 50% of what a bar would be, etc. 


Style. Line with ticks, the default, is a vertical line bounded by horizontal ticks. Box is 
а vertical box. 
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Several types of coordinates, including polar, spherical, and triangular are available, as 
well as projections, for the appropriate data. Use the Coordinates tab to change the 


coordinate system. 


Bl Graph:Scatterplot 


Legend Co]. 


| ates] uoa 
| Main || Options 
| Coordinate system 


О Rectangular 

O Polar 

© Spherical 

© Triangular 

(© Projection: [übiquegnomonic | 


3D triangular plots 

Available variable(s): 
COUNTRY$ 
POP_1983 
POP_1986 
POP_1990 
POP_2020 
URBAN 

| BIRTH. R2 


Smoother | Residuals | Coordinates 


and Label | Suilace ar 
Avis |. Y xis] 2 


Rectangular. Also known as Cartesian coordinates. Each point is determined by its 


distance from the origin along the x and y 
dimensions). 


axes (two dimensions) and the z axis (three 
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Polar. Polar coordinates translate rectangular coordinates into a circular arrangement. 
Each point is determined by its distance (r) from the origin and the angle (q) between 
the positive x axis and the vector from the origin to the point. For three-dimensional 
plots, there is a third coordinate that gives the z height of the point. Thus, polar 
coordinates in 3-D plots are really cylindrical coordinates. 

Polar coordinates are most often used where direction and extent are the most 
meaningful expressions of the relation between two variables. More generally, 
however, polar coordinates are a mathematical transformation of the expression of an 
image. 


Spherical. You can plot 3-D graphs in spherical coordinates. The three variables you 
specify are plotted as [q, г, f]. r is the distance from the point to the origin, q is the angle 
on the horizontal plane measured from the positive x axis, and f is the angle from the 
positive z axis to the line segment between the point and the origin. If you specify two 
variables, the points are plotted on the surface of a unit sphere (r1). 


Triangular. You can plot three variables in two dimensions if you use triangular 
coordinates. The three vertices ofthe triangle correspond to the three-dimensional x, y, 
and z axes. 

Consider the figure below. The graph on the left shows a perspective plot of three 
variables. If we assume that all points in this plot (X7, X2, X3) sum to a constant 
(SYSTAT makes them sum to 1), then they fall on a plane, which is represented by the 
dark triangle in the plot. If we place this plane on the plotting surface, then the original 
axes correspond to the three vertices of a triangle. 


Projection. Projections are transformations of spherical to rectangular coordinates. 
You can choose from Oblique gnomonic, Oblique stereographic, Oblique 
orthographic, Mercator conformal, Lambert cylindrical, Miller cylindrical, Robinson, 
Peters, Sinusoidal, or Fisheye. 
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3-D Triangular plots. For scatterplots only, if you are using a triangular coordinate 
system, you can select a w variable to plot four variables in three dimensions. The w 
variable defines the distance from the point to the triangular (x-y-z) plane. 


Axes 
SYSTAT offers many options for customizing the appearance of the axes of your 
graphs. You can apply some features to all of the axes at once using the АП Axes tab. 
You can apply other features to a single axis using the X-Axis, Y-Axis, or Z-Axis (or 
W-Axis) tabs. 
АП Axes 


Before you create a graph, you can format certain features of all of the axes at once. 
Specifically, you can select which axes and scales to display, and you can transpose à 
plot. 
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Ў Graph:Summary Charts:Dot 


All Axes 
Axes to display Scales to display 


2D EE- 2D: Ld 


Automatic м 30 | Automatic 


Axes to display. Select the axis configuration from the drop-down list. The following 
configurations are available: 
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2-D Axes 
Box Vertical ^ Horizontal None 
Bottom Top Left Right 


Cross. 


imis et 


3-D Axes 
L Book Box Попе 


d 
еше т 


Scales to display. Select the scales you want to display from the configurations in the 
drop-down list. (This display is independent of the axes displayed.) 

Tick mark style. The tick marks can be placed inside, outside, or through the axes. 
Tick mark location. The default position for the first and last tick marks is Flush with 


the maximum and minimum. To avoid overlapping the first and last tick marks on the 
various axes, select Indent. Float allows SYSTAT to select "nice" scale numbers. 


Тгапѕроѕе X-Y axes. Rotates the display 90 degrees, switching the positions of the x 
and y axes (for example, a sideways bar chart). 
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Specific Axes 
Before you create a graph, you can set various options specific to each axis. You can 


set minimum and maximum values on plot scales, specify labels for axes, add a grid to 
the plot, and transform the data using log or power transformations. 


Ж Graph:Line Chart 


AlAxes È Layout | 
Main | Options | EnorBars | Coordinates | XAxis 


Axis labet || 
Scale range 


Minimum: 


Maximum: 


Tick mark intervals 
Labeled (Tick): 
Unlabeled (Pip): 


О Reverse scale 
C Display giid lines 


Axis label. Type a label of up to 99 characters. To suppress the label for this axis, type 
a blank space. 


Scale range. Specify the maximum and minimum values to appear on the axis. Any 
data values outside these limits will not appear on the display. 


Transformation. Log or power transformations are often used to make the shape of a 
distribution symmetric or to stabilize variances across groups. 
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W Power. Fora power transformation, specify a number between —3 and +3. A value 
of —1 corresponds to taking the reciprocal of the values. A value of 2 corresponds 
to squaring every value. 

m Log. For a log transformation, specify an integer between 2 and 10 (inclusive) to 
designate the logarithm base. 

Of course, you can also log your data in SYSTAT’s worksheet. Then the scales would 

be in log units. The advantage of using the log transformation on the plot axes is that 

the scales are displayed in the original data units. 


Tick mark intervals. The value in Labeled (Tick) determines the number of intervals 
marked by ticks between the minimum and maximum values on the axis. For example, 
if you specify 4, there will be four intervals between the minimum and maximum, and 
three tick marks. The value in Unlabeled (Pip) determines the number of intervals 
marked between adjacent tick marks. 

Limit lines. To display lines perpendicular to the axis, specify values (within the axis 
range) for an upper line, a lower line, or both. 


Reverse scale. Arranges the values in reverse order, from maximum to minimum. 


Scale format. Indicates the number of digits after the decimal in the labels on each tick 
mark. 


Display grid lines. Grid lines perpendicular to the axis are displayed at each tick mark. 
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Layout 


To help you perfect the size and layout of your graphs, all SYSTAT graphs have a 
layout option. This option allows you to add titles, adjust graph size, and arrange 
multiple graphs on a single page. In general, to change the layout options, click the 


Layout tab in the dialog box you are using to create the graph. 


In addition, SYSTAT provides a single-page mode, which places all graphs on a 


single page. 


Graph:Summary Charts:Pyramid 


Graph location from origin 


Righ/tomx [0 | 


Up/down from: |0 


Measurement units: Inches 
Multiple graph arrangement 


Number of rows 


Number of columns 


Graph size 
Height |4 
Width: |4 
Altitude: |4 


[Г] Cleveland median slope adjustment 


369 


Options and Features for Graphs 


Graph title. Type a title to appear above the graph. 


Graph location from origin. Allows you to set the position of a graph on the page by 
specifying x and y coordinates. This is useful for arranging multiple displays in the 
Graph Editor tab. The coordinates for the location are measured as the displacement of 
the lower left corner of the graph from its default position. The values entered here 
apply to the current graph only. By default, the lower left corner of a graph is at the 
origin (2.25 inches from the left edge and —5.5 inches from the top of the page). 
m Right/left from X. A positive value moves the graph to the right and a negative 
value moves it to the left. 
m Up/down from Y. A positive value moves the graph up, and a negative value moves 
it down. 
In the figure below, the point shows the default position at the bottom, left corner of a 
display. The lower left corner is Origin-2.25IN,-5.5IN, and Right/left from X and 
Up/down from Y are both 0. Specifications in Graph location from origin alter an 
individual plot and apply to only one graph, whereas Origin is a global feature that 
remains for subsequent displays. 


0 2 4 6 8 
9 jr = 
OIN, O IN; 
2r 
Pile 
s| "3 Н uon 
* ORIGIN 2.25 IN, -5.5 IN 
' (default) 
E | LOC - OIN, OIN 
a xd d eu. Ent. 
0 2 4 6 8 


If you specify DRAW BOX, you can replicate this drawing on the screen and print the 
result to verify the position of the Graph Origin with your printer and paper alignment. 
The following figures illustrate that if you set Right/left from X = 2 and Up/down 
from Y = 1 (measurement units = inches), the graph moves right two inches and up one 
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inch from the origin. If you set Graph Origin at (1 inch,-5 inches), the value for | 
Right/left from X is still 0 inches, and Up/down from Y is 0 inches at this new origin 


position. 
RJ 2 4 В А 00 xe t ы 7 
SS m - 
N \ 
OIN, OIN || ON. 
2- a | 
^ i д | 
° ! = 2IN, 1IN | Н 
at | LOC = 2IN, st ; 4 
Н | ORIGIN 1 IN,- 
ORIGIN 2.25 IN, -5.5 IN Н посах N^ UM 
(default) i 8 1 4 
E $ Е 
нра lit Ha tiq sl sto 1—1 
o 2 4 6 8 0 2 4 6 8 


The coordinates are specified in the units shown in the dialog box. These are set 
globally. 


Graph size. The specifications for Height and Width (and Altitude for 3-D displays) 
apply to the frame of the graph itself, not including labels for axes, titles, etc. To scale 
the entire display, select the Graph tab after choosing Options from the Edit menu. 


Measurement units. The units for distances specified in the dialog box. You can 
change units for all graph dialog boxes by selecting the Graph tab after choosing 
Options from the Edit menu. 


Multiple graph arrangement. When you use multiple variables or groups, you can 
specify the number of rows or columns of graphs to appear on the page. By default, 
SYSTAT arranges the individual displays to maximize their size. 


Cleveland median slope adjustment. This adjustment automatically scales line graphs 
for data such as time series. It adjusts the height and width of the graph so that the 
median absolute physical slope of the plotted line segments is 1, following the theory 
and experiments of Cleveland. 
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Legend 


The Legend tab gets enabled when you draw multiple graphs in a single frame; all 
SYSTAT graphs have this option. This option allows you to display a graph legend (or 
key), specify the legend location, legend title, and legend labels.The legend provides a 
key to the symbols, fills, lines, or colors that distinguish them. Display legends using 
the Legend tab of the graph's dialog box. 


Ў Graph:Summary Charts:Dot 


í nd 
Legend location from origin 


Right/left from X: INED 
Up/down from Y: Eor 


Measurement units: Inches 
Legend tite: [ 
Legend item labels 
© Use default labels 

©) Specify labels: 


Display legend. To remove the legend from the display, deselect this item. 
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Legend location from origin. Specify the location ofthe lower left corner of the legend 
in relation to the lower left corner of the graph (the origin). 


Measurement units. The units for distances specified in the dialog box. You can 
change units for all graph dialog boxes by selecting the Graph tab in the Global 
Options dialog box. 


Legend title. You can specify a title to appear above the legend. 


Legend item labels. You can specify labels for legend items. These should be in either 
the order the variables are specified or, if a grouping variable was used to split the data, 
the order of the group values. 


Overlay Mode 


Overlay Mode is used to place several graphs on one page (by choosing Begin Overlay 
Mode and End Overlay Mode from the Graph menu). You can overlay the graphs or 
separate them by specifying positions using Graph location from origin in the Layout 
tab of the graph's dialog box. 


Appearance 


The appearance of a graph is important if you plan to use it in publications or 
presentations. SYSTAT allows you to customize the look of your graphs by changing 
color, fill patterns, surface styles, line styles, and symbols and labels. 


Color 


You can control the colors on a graph by using the Color tab for each graphical display. 
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Ва Graph:Distribution Plots:Quantile Plot 


Element color — — 
© Automatic color selection 
О Select color: 


Element color. You can specify colors of plot elements (symbols in a scatterplot, boxes 

in a box plot, bars of a bar chart, wedges in a pie chart, etc.) in one of the two ways: 

= Select a color from the color drop-down list for each of the y variables. On 
selecting Customize, you can choose from any of 64 shades of color, organized 
over a spectrum of cool to warm. You can also enter any value between 0 and 1.0 
or integers 1 to 12 by double clicking on Color Value edit box and get the 
corresponding shade. 

m Selecta variable that you have created to specify color. The value for each case is 
used to set the color for the plot element corresponding to that case. The variable 
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can be a character variable with the words red, blue, green, and so on, or it can be 
a numeric variable containing integers 1 to 12 or values between 0 and 1.0. 


Axes color. The color of axes, scales, and labels. Select a color from the drop-down list. 
Interior color. The color of the interior of the plot. Select a color from the drop-down 
list. 


Whether you are using the dialog boxes, variables, or commands to specify colors, 
integers from | to 12 correspond to colors as follows: 


Integer Name Integer Name 

1 Red 7 Сгау 

2 Blue 8 Violet 

3 Green 9 White 

4 Yellow 10 Black 

5 Orange 11 Суап 

6 Brown 12 Magenta 


Alternatively, to see what colors are associated with various integers, open a data file 
and then try the following. 


The input is: 


SELECT case<13 
LET a-case 
LET у=0.5 
CATEGORY a 
PLOT y / XGRID COLOR=a SYMBOL=8 SIZE-2.5, 
FILL HEI-1.5IN WID=4IN YTICK-1 YLAB-' ' 
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Fill 


You can choose a Fill Pattern on a graph by using the Fill tab for each graphical display. 


Bl Graph:Scatterplot 


| Main | Options 
| Allàxes || Layout 
Fill pattern 


© Default fill selection 
© Select fill 


Fill pattern. Enables you to specify the pattern inside the bars, symbols etc. You can 
specify the fill pattern in one of the following two ways: 


m Select a fill pattern from the drop-down list for each of the y variables specified. 
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W Select a variable that you have created to contain integers between 1 and 7 for 
patterns, or numbers between 0 and 1 for shades of gradation between empty and 
solid (white and black, unless you have chosen another color). SYSTAT can 
produce 15 shades of gray represented by numbers between 0 and 1. When you 
type a number between 0 and 1, SYSTAT chooses the shade of gray nearest to that 
number. (If you select a numeric variable, you should use Category to define the 
variable as categorical.) 


For the fill patterns, choose among the following alternatives: 


5 6 7 


Surface and Line Style 


In SYSTAT, you can incorporate lines in a graphical display in several ways: 
= Connect plot points in the order in which their cases appear in the data file. 
m Define a line style (for example, solid, dashed, dotted) for elements within 
displays. 
Draw dashed grid lines across the display. 


m Add one or two reference or limit lines (for example, for the variable temperature, 
draw a dashed reference line at 32 degrees). 


Connecting points. In the Options tab of a scatterplot dialog box, you can connect plot 
points in the same order in which they appear in the data. For dot graphs, select Options 
and then Line connected in left-to-right order. Using this option with unordered data 
will probably produce a messy spider web of points. This option is often used for time 
series data where cases are sequential. 


Cleveland median slope adjustment. This option in the Layout tab automatically scales 
line graphs for data such as time series. It adjusts the height and width of the graph so 
that the median absolute physical slope of the plotted line segments is 1, following the 
theory and experiments of Cleveland. It is highly recommended for time series plots. 
The equivalent command is SLOPE. 


- 
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Use the Surface and Line Style tab to customize the appearance of lines or surfaces. 


Ж Graph:Scatterplot 


The following options are available: 


Line style. If you have more than one line, each line can have a distinctive style, such 
as solid, dashed, or dotted. Choose a style from the drop-down list. Line style choices 
include: 
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Cutting surface. For a 3-D surface, you can specify the direction, color, or pattern of 
the cutting surface, X Cut Lines, Y Cut Lines, and Z Cut Lines are drawn perpendicular 
to the x, y, or z axis respectively. XY Cut Lines are in two directions. Choose Colored 

Solid for a surface consisting of multiple colors, or Patterned Solid to display patterns 
in black and white. Shaded Solid, the default, produces shades of one color. 


Number of grid cuts. Specify the number of cuts in the grid. For contour plots in two 
dimensions, SYSTAT computes a 25 x 25 grid, and for surface plotting in three 
dimensions, a 30 x 30 grid. The maximum number of cuts is 60 and the minimum is 2. 
You should rarely need fewer than 10 or more than 40. 

A value of 10 is the best way to save computer time when you are trying to get a 
rough sketch of a surface. If you want a more detailed view, use 40 cuts, You may need 
an even larger value for some complex mathematical functions with steep cliffs. 
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Symbol and Label 


Using the Symbol and Label tab, you can choose from a variety of distinctive symbols 
to mark the points on a graphical display. You can also label individual points. 


in Graph:Scatterplot Matrix (SPLOM) 


Мап { Options Smoother 
Color 


Symbol type я 1 Symbol size 
($) Automatic symbol selection © Default symbol size 
© Select symbol O Enter size 


O Select variable РМА 


Case labels 
[Г] Display case labels К 
Label sie 1 zi Select variable’ | COUNTRYS 


Symbol type. Choose from among SYSTAT’s 23 built-in symbols, or use a keyboard 
character. 
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OX-rAV«DLIOO | —QNOOKKIH о Ф 


уо ко олоо к уо о олоор рр 


W Select symbol. You can select the symbols from the symbol drop-down list for each 
of the y variables. 

= Enter character. You can specify a character, such as a #. All groups use this 
symbol, and the groups are distinguished by color. 

m Select variable. You can selecta variable that you have created to specify symbols. 
The value for each case is used to set the symbol corresponding to that case. If the 
selected variable is numeric, the symbols are assigned from SYSTAT's list of 23 
built-in characters, as displayed in the order on the drop-down lists. If'a character 
variable is selected, the first letter of each character value is used as the plot 
symbol. 


Symbol size. These selections change the size of the plot symbols. 


m Enter size. The default size is multiplied by the number you specify. Use a value 
of 0 to plot without symbols. 

m Select variable. Specify a numeric variable whose scaled values determine the size 
of the symbols for each case. 

Case labels. Each case can be labeled with the value of a character variable. 

= Display case labels. To label each case, select this option first. 

m Label size. The size of the label is in relation to the default. For example, if you 
select 2, the label is double the default size. 


m Select variable. The label for each case is the value of the selected variable. 


Global Graph Options 


Like the local options in the previous section, global options affect the appearance of 
your plot. However, there is one important difference. Local options affect only the 
type of graph for which they are set. If you switch to a different type of graph (for 
example, from bar to pie chart), all the options for the new graph are reset to the 
defaults. In contrast, global settings remain set for every graph until you reset them, 
even if you restart SYSTAT. That is, if you globally set line thickness to make lines 
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twice as thick as the default, every bar chart, every dot plot, etc., will have thick lines 
until you change the setting. 

To change your global options settings, click the Graph tab after selecting Options 
from the Edit menu. 


Edit: Options 


| Genel]. Data | Output] Output Scheme| Staph | Fie Locations) t 
| Reduce/Enlarge r Appearance 
Ж-зсае (Ж]: [ro 3 Line thickness: 
Y-scale (2): [o — 24 || Character size: 
Опа in Inches f Eye in Inches 
x [m —] || 


|| 2 
М 


Facet: None м. f AD ER Indias а 
0 


Depth: © Е Canvas background 

Еос соо MN] re Emm] 

[С] Create graphs using monochrome palette Color scheme: | 

[Г] Switch active graph file to view mode + Boundary: = H 
when another is set active | 

| Color: 

ев 


‘Width: 


Threshold for hexagonal binning: 2_ zi 
Number of hexagonal grid cuts: EX =] 


Reduce/Enlarge. Lets you enlarge or reduce the size of a graphical display by a 
percentage of the standard size. To enlarge your display, specify a percentage larger 
than 100. To keep the proportions, specify the same percentages for X-scale and Y- 
scale. 

Although the HEIGHT and WIDTH commands also affect the size of a display, they 
differ from Reduce/Enlarge in two important ways. First, Reduce/Enlarge affects all 
the plots made during a session while HEIGHT and WIDTH affect only the current plot. 
Second, Reduce/ Enlarge determines the size of the entire display, including titles, 
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legends, etc.; HEIGHT and WIDTH determine the size of the plot frame only—titles and 
legends are added to the size of the display specified by HEIGHT and WIDTH. 

Be careful if you use Reduce/Enlarge together with HEIGHT and WIDTH. If you set 
both Reduce/Enlarge coordinates to 75% and then draw a scatterplot, setting HEIGHT 
to 4 inches and WIDTH to 6 inches, you get a plot that is 3 inches tall and 4.5 inches 
wide. If, instead, you had set Reduce/Enlarge to 75% for Y-scale and 67% for X-scale, 
with the same height and width settings you would get a plot 3 inches tall and 4 inches 
wide. 


Line thickness. Scales the thickness of the lines in the graph by the number specified. 
For example, enter 2 to make the lines twice as thick. The maximum value you can 
enter here is 10. 


Character size. Controls the size of characters in the display's labels, titles, etc. For 
example, 0.5 makes the characters half their normal size. To control the size of 
characters used inside graphs (labels or letters used as plot symbols), use Label size in 
the Symbol and Label tab of the graph's dialog box. 


Origin. Positions the origin of the plot on the page (that is, the lower left corner of the 
display). Specify the X and Y coordinates. The unit of measurement is decided from 
the setting in the Measurement units dropdown list. The default origin is (2.25 inches, 
-5.5 inches). Origin changes the position of all subsequent graphs. To change the 
position of an individual graph, use the Location option in the Layout tab of the graph's 
dialog box. 


Eye. Controls the perspective for viewing a 3-D plot. Specify the X, Y and Z 
coordinates. The unit of measurement is decided from the setting in the Measurement 
units dropdown list.. Defaults are approximately x=-6 inches, y=-8 inches, z=6 inches. 


Facet. Specifies the plane onto which subsequent 2-D graphs are plotted. This is used 
for overlaying 2-D plots in a 3-D perspective. 


Depth. Controls the position of a plane along a facet. Enter any number here. The unit 
of measurement is decided from the setting in the Measurement units dropdown list 
For example, entering 3 when the unit of measurement is Centimeters, entering 3 here 
specifies a 3-centimeter depth. 


Measurement units. Select Inches, Centimeters, or Points from the dropdown list 
These units apply to all graph dialog boxes until you change them. 
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Background color. Select the box and click the adjacent color button to choose a dense 
color for the graph background of the subsequent graphs drawn. The graph background 
is transparent and colorless if the box is clear. 

If background is specified as an option in a graph command, then the global color 
specification is ignored. 


Create graphs using monochrome palette. Creates all subsequent graphs using 256 
shades of gray instead of color. 


Switch active graph file to view mode when another is set active. Select to append 
recently added graph with its Graph Editor as active (shown in yellow color). Graph 
Editors of previously drawn graphs are pushed behind one after another as inactive 
Graph Editors (shown in gray color). If this box is not selected, no new Graph Editor 
is created. 

Switch active graph file to view mode when another is set active. By default, when 
a graph is drawn, the currently active graph closes down. In such situations, you can 
set the currently active graph to switch to view mode instead of closing down. 


Threshold for hexagonal binning. Hexagonal binning creates a sort of bivariate 
histogram for large data sets where the X-Y plane is tessellated by a regular mesh of 
hexagons. The radius of the hexagons is the same throughout but its color depends on 
the range of the frequency of the x-y values. You can set SYSTAT to do automatic 
hexagonal binning. Enter a threshold value; if this number is greater than the number 
of cases in your file, hexagonal binning will not be done. 


Number of hexagonal grid cuts. While doing automatic hexagonal binning, the 
plotting plane is divided using 25 grid cuts along the X as well as Y-axis. You can 
specify any number in the range [2, 50]. For example, if you enter 7, then the entire 
graph frame is split into a 7 x 7 square grid (grid lines are not displayed), and each cell 
frequency is represented by a colored hexagon (the graph legend shows the color 
coding). 

Canvas background. Canvas background is a rectangular space that surrounds all the 
graphs that appear in a single display. You can fill the canvas with a different color, and 
apply different color schemes. By choosing an option from the Color scheme 
dropdown list, you can apply various designs to the canvas. When None is selected in 
Color scheme, you can select a color through the Color dialog by clicking the Fill color 
button. (With other choices in Color scheme, fill color is ignored.) You can also change 
the style, width, and color of the canvas boundary. Click the Color button, choose a 
color from the Color dialog that pops up, or define a custom color. Select a line style 


384 


Chapter 9 


from the Style dropdown list. Select or enter a value between 1 and 10 in the Width 


dropdown list. 


Font. Click this button to open the font dialog box. You can choose the font, font style, 
size and effects to be used in all subsequent graphs. 


Using Commands 


Local commands are all the options that appear after the slash in the syntax, whereas 
Global commands do not appear after a slash. Global commands can appear before or 
after specifying the data file, but the changes are affected only to the graphs that are 
produced after the global command. 


FREQUENCY var 


ORIGIN хут 
SCALE x y 

THICK л unit 
CSIZE n unit 


EYExyz 

FACET XY or XZ or YZ 
DEPTH n unit 
MONOCHROME on or off 


To save space in a data file, cases with the same values for each vari- 
able are entered as a single case with a count stored in var. The sam- 
ple size is the sum of the values of var. The values of var are truncated 
to integers before processing. If var contains negative values, corre- 
sponding cases are treated as missing values. Issuing FREQ without 
an argument clears the current selection. WEIGHT command does 
not work in Graphics. 


Positions the origin of the plot on the page. 
Controls the height-width proportion. 
Sets the thickness of lines in graphs. 


Controls the character size of graph scales, axes labels, legends, and 
titles. 


Controls the perspective for viewing a 3-D plot. 

Specifies the plane onto which subsequent 2-D graphs are plotted. 
Controls the position of a plane along a facet. 

Sets the palette to 256 shades of gray for all subsequent displays. 


Local commands can appear only after specifying the data file. After specifying the 
data file with USE filename, your command line might look like the following: 


COMMAND varlist / options 
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Replace options with one or more of the following, depending on the type of graph you 
are creating. For more detailed information, consult your SYSTAT Language 


Reference manual. 


HEIGHT - n unit 
WIDTH = n unit 
ALTITUDE - n unit 
LOC = nl, n2, ... 


XLABEL = ‘text’ (also YLABEL, ZLABEL) 


LABEL - var$ 
LEGEND - x unit y unit (or NONE) 


LLABEL = ‘text/’, Чехї2',... 


LTITLE = ‘text’ 
TITLE = ‘text’ 
FTITLE =off 


FTITLE = on 


SIZE = п (or var) 

SYMBOL = n1, n2, ... (or charlist or varlist) 
XMAX = n (also YMAX, ZMAX, or WMAX) 
XMIN = л (also YMIN, ZMIN, or WMIN) 


AXES - type (for example, NONE or 
BOTTOM or CROSS) 


SCALE = type (for example, NONE or 
BOTTOM or CROSS) 


TRANSPOSE 
XTICK = n (also YTICK, ZTICK, or WTICK) 
STICK = IN (or OUT or THROUGH) 


TICK = FLUSH (or FLOAT or INDENT) 


XPIP = n (also YPIP, ZPIP, or WPIP) 

XLOG = n (also YLOG, ZLOG, or WLOG) 
XPOW = n (also YPOW, ZPOW, or WPOW) 
ROW =n 

COL=n 


Specifies the physical height of a plot 
Specifies the physical width of a plot 
Specifies the physical altitude of a 3-D plot 


Specifies the location of the plot as x, y coordi- 
nates from the origin 


Labels the axis with the text string specified by 
text 


Labels each case with the value of var$ 


Determines the location of the lower left corner 
of the legend 

Labels each value in the legend of a multival- 
ued graph 

Places a title above the legend 

Places a title above the graph 

hides titles of frames for the frames generated 
by GROUP option 

displays titles of frames for the frames gener- 
ated by GROUP option (if this option is not 
mentioned, by default frame titles are shown) 
n is a symbol size multiplier 

Specifies plot symbols 

Gives the maximum scale value 

Gives the minimum scale value 

Specifies the type of axes to print 


Specifies on which axes to print scales 


Rotates plot 90 degrees 

Divides the axis into n intervals 

Forces tick marks outside or through the graph 
frame (axes) 

Locates tick marks in relation to the ends of the 
axis 

Divides the tick marks into z intervals 

Logs the data to the base n before plotting 
Raises each data value to the n power 
Number of rows in multipanel displays 
Number of columns in multipanel displays 
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XREV (also YREV, ZREV, WREV = п) 
SERROR = л (or vars) 
ERROR - n (or vars) 
DIRECTION - var 
ETYPE - TICK (or BOX) 
ETHICK =n 
COLOR - colorlist (or numlist or varlist) 
FCOLOR - colorlist (or numlist or varlist) 
ACOLOR - colorlist (or numlist or varlist) 
FILL = n1, n2, ... (or varlist) 
XGRID (also YGRID, ZGRID) 
XLIMIT = n, p (also YLIMIT, ZLIMIT) 
DASH =n 
SLOPE 
THREED 
XFORMAT = n (also YFORMAT, 
ZFORMAT, WFORMAT) 
CSIZE 
SURFACE=XCUT (or YCUT or ZCUT or 
XYCUT or COLOR or FILL) 
CUT=n 

Examples 

Example 1 

Error Bars 


Reverses the scale 
Adds standard error bars 
Adds error bars in standard deviation units 


Variable(s) specify the direction of the error 
bars 


Specifies the type of error bar, tick or box 


Specifies the length of the horizontal ticks that 
bound the error bar 


Colors graph elements (bars, points, lines, etc.) 
Colors the foreground 

Colors axes, scales, and labels 

Specifies fill patterns for symbols, bars, etc. 
Draws grid lines extending from the tick marks 


Adds dashed lines to axes to mark control lim- 
its 


Specifies the type of line used in plot elements 
Applies the Cleveland median slope adjustment 
hs a pseudo depth to a 2-D display, making 
it 3- 

Writes numbers or plot scales with л digits fol- 
lowing the decimal point 

Specifies the size of characters inside the graph 


Specifies the direction of surface cuts and the 
color and fill of surfaces 


Controls smoothers of a 2-D smoother or func- 
tion 


In the following plots, we display error bars in standard error units. The first plot 
represents the error bars using lines with ticks. The second uses a box representation. 
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The input is: 


USE RCITY 


BEGIN 
BAR PCTTAXES * REGIONS / SERROR FILL=0 LOC=-2.5IN,0IN 
BAR PCTTAXES * REGIONS / SERROR FILL-0  ETYPE-BOX LOC-2.5IN,01 


END 
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The top of each bar or box extends one standard error above the mean. 
Example 2 
Axes to Display 
Here, we use L to select the left and bottom axes only. 
The input is: 
USE RCITY 
BAR REGIONS / AXES-L 
The output is: 


389 


Options and Features for Graphs 


Example 3 
Scales to Display 


Here is a bar chart with only a bottom scale and bottom axis. 


The input is: 
USE RCITY 
BAR REGIONS / AXES=BOTTOM SCALE=BOTTOM 


The output is: 


SPEEA 


Example 4 
Axis Labels 


Here, we add axis labels to a scatterplot with the following commands. 


The input is: 
EU WORKWEEK / FILL-1,XLAB 
PLOT PCTTAXES * =1, =, 
'Length of Workweek (in hours)', 
YLAB-,'$ of Income Workers Pay, 
in Taxes' 


Note that with these long labels, we avoid splitting them across the lines of the graph. 
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The output is: 


% of Income Workers Pay in Taxes 
ё 8 
T 


Example 5 
Limits and Reference Lines 


Asan example, let's mark the approximate 10th and 90th percentiles ofthe PCTTAXES 
distribution (we obtained 7 and 32 for these limits from the cumulative percentages 
printed by the List option on Xtab). 


The input is: 
USE RCITY 


PLOT PCTTAXES * WORKWEEK / YLIMIT-7,32 LABEL=CITY$ CSIZE-1.3 


The output is: 
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Clearly the workers in a few European cities stand out as having to contribute more of 
their income to taxes and social security than those in other cities—at the same time, 
however, these workers’ workweeks are among the shortest! 


Example 6 
Transposing a Display 


In this example, we present two plots. The second corresponds to a 90 degree rotation 
of the first. 


The input is: 


USE RCITY 


BEGIN 
BAR PCTTAXES*REGIONS / LOC=-3.5IN, 0IN 


BAR РСТТАХЕЅ*КЕСІОМ$ / TRANS LOC-3.5IN,0IN 


END 
The output is: 

А : recta | 

е 7 d 1 
i je 

P | 
LES Burope | 
ahh = 
d = VE» аа шй 


OSS SS PCTTAXES 


Ка 
REGIONS 
Example 7 
Formats for Plot Scales 


SYSTAT offers two methods for setting formats for plot scales: 

= Specify the number of digits following the decimal point. 

m Use picture format with commands, where, for example, the format is *MMDD, 
YYYY” or “####.##”. The quotation marks are required. 
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Here, we изе formats to specify trailing three zeros after a log transformation, by 
specifying 3 for the number of decimals. 


The input is: 


USE OURWORLD 

BEGIN 

PLOT POP 1983 * POP 2020 / XLOG YLOG FILL-1, 
LOC--2.5IN,0IN 

PLOT POP 1983 * POP 2020 / XLOG YLOG XFORM=3 YFORM-3, 
FILL-1 LOC=3IN, 0IN 


END 
The output is: 
8. A 8. 
P SE $ 
Example 8 
Pip Marks and Tick Marks 


In the default display of PCTTAXES versus WORK WEEK, PCTTAXES has tick marks 
at 10, 20, 30,... forming six intervals on the y axis and WORKWEEK has ticks at 35, 
40, and 45. Here, we specify y-axis tick marks as 12 to label every 596 increment on 

the scale and x-axis pip marks as 5 to place a pip mark at every hour on the x axis. 


The input is: 


USE RCITY 
BEGIN 

PLOT PCTTAXES * WORKWEEK / FILL-1 LOC--2.5IN,0IN 

PLOT PCTTAXES * WORKWEEK / YTICK-12 XPIP-5 FILL-1 LOC-3IN, OIN 
END 
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The output is: 
е » а; LAE LI 
> s 
| 4 » 
SEC Без 
Б | 
E» b 
2. д 
E + 
a i 


Example 9 
Transformations 


In this example, we use OURWORLD data that have variables recorded for 57 
countries. First, we plot population projections for the year 2020 versus the population 
in 1990, and then we log transform both variables. 


The input is: 


USE OURWORLD 
BEGIN 
PLOT POP 2020 * POP 1990 / FILL-1 LOC--2.5IN,O0IN 
PLOT POP. 2020 * POP 1990 / XLOG YLOG FILL-1  LOC-3IN, OIN 


END 


The output is: 


Notice that the points in the left plot clump in the lower left corner. These data are not 
well suited for analyses based on linearity (correlations, regression, analysis of 
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covariance, discriminant analyses, etc.). The shape of the bivariate distribution after 
log transformation (right plot) is more satisfactory for classical statistical analysis than 


that for the untransformed values. To eliminate the trailing zeros on the plot scales, set 
the number of decimals as 0. 


Example 10 
Size and Shape of Plots 


Let's make a short, wide bar graph. Specify a height of 1.4 and a width of 3 (assuming 
measurement units are inches). These measurements apply to the frame of the graph 
itself, not including labels for axes, titles, etc. 
The input is: 

USE RCITY 

BAR REGIONS / HEI-1.41N WID-3IN 


The output is: 


dis al d d 


REGIONS 


Example 11 
Placing Graphs Side by Side 


In this example, we position a bar chart and a scatterplot side by side (we also order 


the bars by descending frequencies). Both plots are positioned 1 inch above the y 
coordinate of the default origin. 
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The input is: 


USE RCITY 
BEGIN 
ORDER REGIONS / SORT-FDESC 
BAR REGIONS / LOC--.75IN,1IN HEI-2.5IN WID-2.5IN 
PLOT PCTTAXES * WORKWEEK / LOC-2.5IN,1IN, HEI-2.5IN, 
WID-2.5IN,SYM-REGIONS LEG-NONE 


END 


The output is: 


To use the menus and dialog boxes, start by choosing Begin Single Page Mode from 
the Graph menu. Next, create the first graph. After you click OK, SYSTAT waits for 
the next plot. SYSTAT holds all output until you choose End Single Page Mode from 
the Graph menu. SYSTAT then places all plots specified between BEGIN and END in 


one graph window (or on one piece of paper). 


Example 12 
Arranging Displays in Rows and Columns 


ult quantile plots for the SURVEY2 depression inventory. 


For example, here are defa 
seeking behavior 


The SURVEY? data are responses to a survey on depression and һе1р- 
among adults (Afifi, Clark and May, 2003). A depression index was constructed by 
asking people to respond to 20 items; for example, “I felt I could not shake the blues.” 
The subject’s responses were scored 0 (less than one day last week), 1, 2, and 3 (five 
to seven days last week). We specify all 20 items (variables) in one request, so they 


appear together on the screen. 
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We're displaying 20 plots within one window; to make it less cluttered, we remove 
the scales and labels. 


The input is: 


USE SURVEY2 
QPLOT BLUE .. DISLIKE / SCALE-NONE YLAB=' ' XMIN--1 


The output is: 


These plots are very small, but, because they're displayed within a single window, it is 
easy to compare and contrast the responses. Look at the fourth graph in the top row (for 
CRY) and the graph in the bottom left corner (GET GOING). Few people cried during 
the previous week, but more than 50% had trouble "getting going." 

Instead of the default four rows and five columns of plots, here's how the 
arrangement looks when we ask for seven columns. 


The input is: 
QPLOT BLUE .. DISLIKE / SC=NONE YLAB=' ' COL-7 
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Example 13 
Legend 
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Here, we define subpopulations using the integers 1 through 5 stored in the variable 
REGION. In the first plot, we assign a legend name for each subpopulation. In the 
second plot, we move the legend inside the plot frame. 


The input is: 


USE RCITY 
PLOT PCTTAXES * WORKWEEK / 


PLOT PCTTAXES * WORKWEEK / 


GROUP-REGION OVERLAY, 
LLABEL-'North America', 

'Middle & South America', 
'Africa','Europe', 

'Asia / Pacific', 
SYMBOL-1,4,5,6,7, 

FILL-1 SIZE-1.3, 
COLOR=BLUE, RED, BLACK, GRAY, BROWN 
GROUP=REGION OVERLAY, 
LLABEL='North America', 

‘Middle & South America’, 
‘Africa', 'Europe', 

‘Asia / Pacific', 
LEGEND-1.75IN,2IN, 
SYMBOL-1,4,5,6,7, 

FILL-1 SIZE-1.3, 
COLOR-BLUE, RED, BLACK, GRAY , BROWN 
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The output is: 


60 
50 


Multiple Y Variables 


Here, we assign legend labels for the multiple y variables, LIFEEXPM and LIFEEXPF, 
plotted against URBAN, the percentage of people living in the city. 


The input is: 


USE SUBWORLD 

PLOT LIFEEXPM LIFEEXPF * URBAN / FILL=1,0 SIZE=1.3, 
LLABEL='Male Life Expectancy (in, 
years)','Female Life Expectancy, 


(in years) ', OVERLAY LEGEND-4.4IN, 
OIN,SYMBOL-20,21 
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The output is: 


Value 
8 
D 


50 
до >% Mie Ule Expectancy (n years 
$ Female Ше Expectancy (n years] 


| 
LLL tt 1 1 
395 20 30 40 50 60 70 80 90100 
URBAN 


Example 14 
Filled Plot Symbols 


Here, we introduce data from the SUBWORLD file, a subset of the OUR WORLD data. 
We plot both male and female life expectancy on the y axis against URBAN, the 
proportion of people living in cities, on the x axis. We use the male symbol (number 
20) and the female symbol (number 21), respectively. We use fill pattern 1 for the male 
values, 0 for the females. We also ask that these symbols be 1.5 times the default size. 
To display two y variables in a single frame, we overlay multiple graphs into a single 


frame. 


The input is: 


USE SUBWORLD 
PLOT LIFEEXPM LIFEEXPF * URBAN / SYMBOL=20,21, 
FILL=1,0 OVERLAY, 


SIZE=1.5,1.5 
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The output is: 


Example 15 
Line Styles 


We now illustrate line styles and connected points to identify multiple variables in a 
dot plot and use line styles to identify groups among 50% confidence ellipses. 

Using the USVOTES data, we plot three variables (total votes for Bush, Clinton, and 
Perot in the 1992 election) in a dot plot. The votes are totals by census division. We 
use three line styles to connect points in left-to-right order. 


The input is: 


USE USVOTES 
DOT PEROT..BUSH * DIVISIONS / LINE DASH-7,1,11, 


OVERLAY HEI-2IN WID-3.2IN, 
COLOR-BLUE, RED, BLACK 
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The output is: 


€ PEROT 
X CLINTON 
+ BUSH 


QS SESE ELE 
А 


DMSIONS 


The categories for each variable are connected in the order they appear in the data file. 
LINE and DASH assigns a line style to each variable. The solid line connects the votes 
for Clinton; the short dashed line, the votes for Perot; and the dotted line, the votes for 


Bush. 


Multiple Groups 


Using the RCITY data, we request different line types for 50% confidence ellipses for 
each of the five geographic regions. 


The input is: 


USE RCITY 
PLOT PCTTAXES * WORKWEEK / GROUP=REGION$ OVERLAY , 
ELL=.5 DASH=5,9,3,1,2, 
COLOR=BLACK SYM=1 FILL=1, LEG = NONE 
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The output is: 


PCTTAXES 


35 40 45 
WORKWEEK 


The vertical ellipse near the middle of the display with a solid line centers on North 
American cities; the ellipse perpendicular to it with the longest dashes, the 
Pacific/Asian cities; and the vertical ellipse at the upper left, the European cities. 


Example 16 


Symbols 


As examples, we add four symbol requests to the default scatterplot of PCTTAXES by 
WORKWEEK for the RCITY data. For the first plot, we use the built-in star symbol 
(number 10) to plot each point. The second plot uses symbols corresponding to the 
numeric variable REGION that has codes 1 to 5 for geographic regions. Each European 
city had the value 4, зо the points for these cities are plotted with symbol 4, an upwards 
pointing triangle. In the third plot, we select the character variable REGIONS for the 
plotting symbol. The symbol for each city is the first letter of its region name. Finally, 
in the last plot, we use the # symbol as the plot symbol. 

To make larger symbols, we also select the option to multiply the default size by 1.5 
for the second and fourth plots. 
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The input is: 


USE RCITY 
BEGIN 
PLOT PCTTAXES * WORKWEEK / SYMBOL-10 FILL=1, 
LOC--3IN,3IN, 
TITLE="SYSTAT's built-in code 10" 
PLOT PCTTAXES * WORKWEEK / SYMBOL-REGION SIZE-1.5, 
FILL-1 LOC=2.5IN,3IN YLAB-" ", 
TITLE="Numeric variable REGION", 
LEGEND=NONE 
PLOT PCTTAXES * WORKWEEK / SYMBOL=REGIONS , 
LOC--3IN, -3IN, 
TITLE="Character variable, 
REGIONS" 
PLOT PCTTAXES * WORKWEEK / SYMBOL='#' SIZE-1.5 
LOC-2.5IN,-3IN YLAB-" ", 
TITLE-"Keyboard character #" 


END 
The output is: 
SYSTAT's built-in code 10 Numeric variable REGION 
60 mr 
wo 
als ] 
з ы 1 
В 
20r LEM 1 
y r4 xg 
Vp e. fs 
ells Ss 
Sp as 40 45 50 
WORKWEEK 
Keyboard character # 
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Example 17 
Symbol Size 


As an example, let's make the stars in a plot 2.25 times the default size. 


The input is: 
USE RCITY 
PLOT PCTTAXES * WORKWEEK / SYMBOL-10, SIZE-2.25, 
FILL-1 
The output is: 


Symbol Size Using Numeric Variables 


Here, we select the variable REGION to represent symbol size. This variable contains 
integer codes from 1 to 5 for geographic regions. The size of each symbol is 
determined by its code. We use the circle as our plot symbol. 


The input is: 


USE RCITY 
PLOT PCTTAXES * WORKWEEK / SIZE-REGION FILL-1, 
LEGEND-NONE 


405 


Options and Features for Graphs 


The output is: 


The cities in NORTH AMERICA have code 1, so their symbols are the smallest. Notice 
the three tiny circles at approximately the 30% tax level. These are for CHICAGO, 
MONTREAL, and TORONTO. The largest circles, on the right, are for MANILA and 
HONG KONG, with the code 5 used for cities in PACIFIC/ASIA. 


Example 18 
Labeling by Cases 


In this plot, we label each point with the name of the city. These values are stored in 
the variable CITY$. 


The input is: 


USE RCITY 
PLOT PCTTAXES * WORKWEEK / LABEL=CITY$ SYMBOL=8 FILL-1, 
CSIZE-1.3 
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The output is: 


Zooming in on Plot Areas 


Several of the points on this graph are close together, making their names hard to read. 
Let's add range limits and zoom in on the left corner to examine them more clearly. 


The input is: 
PLOT PCTTAXES * WORKWEEK / LABEL=CITY$, XMIN=30 FILL-1, 
XMAX-40 YMIN-0 YMAX=30 CSIZE-1.3 


The output is: 
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Example 19 
Perspective 


Let's look up at a three-dimensional plot. 


The input is: 


BEGIN 

FPLOT Z - X^2-Y^2 ; ZFORM-0 CUT-10, LOC--3,0 
EYE -5,-2,-1 

FPLOT Z - X^2-Y^2 ; ZFORM-0 CUT-10, LOC-3,0 
END 


The output is: 


Example 20 
Increasing the Line Thickness 


Here, we increase the thickness of the axes scales and the lines in the graph. 


The input is: 


THICK 9 

USE IRIS 

PLOT SEPALLEN*PETALLEN / GROUP-SPECIES OVERLAY ELL 
THICK 
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The output is: 


SEPALLEN 


Example 21 
Increasing the Character Size 


Here, we increase the character size of graph scales, and plot labels. 
The input is: 

CSIZE 4 

USE IRIS 


BAR SPECIES / LABEL, XLAB-'' YLAB-'' 
CSIZE 


The output is: 


90.0 0.0 50.0 


ОХ? 


Note: Global settings remain set for every graph until you reset them, even if you 
restart SYSTAT. 
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Editing Graphs 


Debasish Ghosh and K. V. S. Rammurty 
(modified version of the chapter by Alexander Khalileev 
and its revision by B. Rajakumar and S. Anoopama) 


Once you have created a graph, you can use the Graph Editor to change many of its 
features without recreating the graph. Using the Graph Properties dialog box, you can 
rotate three-dimensional displays, change the power and log of axes scales, animate 
the graphs either manually or automatically, zoom the X, Y and Z-axis, change 
properties such as color, axes, labels, symbols, titles, and graph size, change the 
aesthetics of graph objects, draw limit lines and grid lines, and reverse the scale. 
Using the Graph Editing toolbar, you can annotate your graphs and select subsets of 
cases for further analysis. To select a Graph Editing tool, from the menus choose: 


Graph 
Edit 


Graph Editor 


When you create a graph by using a dialog box or by a command from Interactive of 
Commandspace, it appears in the Graph Editor. Graphs generated from Untitled or 
Log of Commandspace appear in the Output Editor. To view the graph in the Graph 
Editor, either double click on it or click the Graph Editor tab shown as Graph], 
Graph2, etc. or double click the corresponding node in the tree formed in the Output 
Organizer tab. If you have produced more than one graph and you are in the Graph 
Editor, then you can view the desired graph by simply clicking on the respective node 
in the tree formed in the Output Organizer tab of the Workspace. 
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Count 
$8 Bes 5598-2588 


The Graph Editor displays graphs for two purposes. The first is to make the graph 
editing and selection of features available. The second is to show the graph exactly as 
it appears when printed from the Graph Editor. Therefore, it has two views, Graph View 
and Page View. By default, the Graph Editor tab opens in Graph View. 


То switch between the two views, from the menus choose: 
View 
Page View 
or 
View 
Graph View 


Alternatively, you can use the tools available in the Graph Editing toolbar to switch 
between the two views. 
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As you prepare your graphs, you may want to change the orientation of your page. You 
can change from portrait (longer than wide) to landscape (wider than long) and back 
using the Windows Print Properties dialog box. You can reach this dialog box from the 
Print dialog box. 


Graph View 


You will do most of your editing in Graph View Tal . It makes available most of the 
editing features, such as the Graph Properties dialog box, selection tools, the drawing 
tools, the annotation tools, zooming, and panning. You can also bring Dynamic 
Explorer from Workspace when you are in Graph View. 


Count 


e 895582923558 


= 
э 


In addition, Graph View allows you to resize graphs. Here you click the graph frame 
to select it. A rectangle with selection "handles" at its edges appears around the 
selected graph. Click and drag any one of the handles until the desired size is reached. 
You can also reposition legends by clicking and dragging them. 

When you right-click in Graph View, you get the following options: 'Realign 
Frames’, 'Animate' (if all the graphs are 3-D graphs or 3-D graphs with a corresponding 
contour or tile), 'Copy' to copy the graphs, "Print Preview’, Show Toolbar' to show 
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selection tools, drawing tools, annotation tools, etc., 'Save As' to save the graphs as an 
image, 'Options' to modify global graph options, 'Close' to close the graph editor. 


Page View 


Page View В. allows you to view the position of ће graph(s) in the page before 
printing. Note, however, that the positioning affects the graph only when it is printed 
from the Graph Editor; it does not affect the positioning of the graph in the Output 
Editor. Like Graph View, Page View also allows you to resize graphs in a similar way 
by dragging the handles of the rectangle that appears around the edges of the selected 
graph. 

Page View also enables you to size the bounding rectangle and the area of white 
space that surrounds each graph. You may want to shrink the bounding rectangle if you 
save graphs as graphics files and use them in other applications or reports. To size the 
bounding rectangle, in Page View, click to select the graph. The bounding rectangle 
appears with "handles" at its edges. Click and drag any corner to size the rectangle at 
that point. 
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Dynamic Explorer 


The Dynamic Explorer becomes active only when there is a graph in the Graph Editor 
and the Graph Editor is active. 


The Dynamic Explorer contains the following tools for changing your views of a 

graph: 

m 3-0 rotation. The 3-D graphs can be rotated either manually or automatically. You 
can zoom the individual axis and use the annotation tools when the graph is being 
rotated automatically. 

m Zoom +/-. Zooms the individual axis of the graph. 
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Selection Tools for Graphs 


SYSTAT has several tools for helping you identify and select data points in plots. 
eig] 


The Single Point Highlight tool allows you to identify a single point in a plot. With this 
tool selected, when you click your mouse over a single data point, the focus shifts to 
the Data Editor with that case highlighted in it. 

In scatterplots, use the Region selection or Lasso selection tools to select subsets of 
cases for further analysis. Use the Region selection tool to select a rectangle-shaped 
region, and use the lasso tool to encircle an irregularly shaped area. To select 
noncontiguous groups of data points, hold down the Shift key while selecting each 
desired area in succession. 

To remind yourself of which cases you have selected, click the Show Selection 
button. The points that are currently selected are highlighted. 

Cases selected with the Region selection or Lasso selection tools also appear 
selected in the Data Editor. If you repeat your graphing command, only the selected 
cases appear in the new graph. To reset the selection, select the Region selection or 
Lasso selection tool and click once in an empty area of the Graph Editor. Or, from the 
menus choose: 


Data 
Select Cases... 


Select Turn off and click OK. 


Annotation Tools 


The Graph Editing toolbar provides drawing tools, which can be used to annotate your 
graphs. These tools let you draw lines, polylines, arrows, rectangles, circles, and 
ellipses on top of your graphs. In addition, the text tool enables you to add text 
notations. You can format the shapes and text before or after drawing a shape. You can 
click on a drawing to select it and drag it to the position where you to place it. If you 
wish to format before drawing a shape, you can use the Text Tool Font and Drawing 
Attributes options, which are available on the Graph Editing toolbar . 
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The pointer tool is used to make graph selection available. The text tool is used to write 
text on the graph. You can get the Annotation Tools by right-clicking in Graph Editor 
and selecting the 'Show Toolbar’ option. In order to make use of the annotation tools, 
simply click on the tool and then click on the place in the Graph Editor where you want 
to annotate, now holding the mouse button and dragging results in the required 
annotation. For a polyline and arrow, single-click results in an edge while double-click 
results in an edge and also releases the cursor from the annotation tool. 


Drawing Attributes and Text Tool Font 


The Drawing Attributes tool ( g ) enables you to control the appearance of lines and 
shapes added to graphs. Clicking on this tool produces the following dialog box: 


II style and its color, and the color and line thickness of lines, 


You can specify the fi 
ellipses and rectangles that you draw. You can also specify fill pattern and color of that 


fill pattern if applicable. To format before drawing a shape, you can use the Text Tool 
Font and Drawing Attributes options, which are available on the Graph Editing toolbar. 
When you double-click the Pointer Tool on a drawing made by an Annotation Tool, the 
Drawing Attributes dialog box pops up and allows you to modify that drawing. 
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To open Drawing Attributes dialog box, from the menus choose: 


Graph 
Edit 
Drawing Attributes... 


) option controls the attributes of the text tool. Clicking 
ollowing dialog box: 


The Text Tool Font ( 
on this tool produces the 


Edit: Text Tool Font 


The Text Tool Font dialog box defines the typeface, size, rotation, color, background 
and style for text annotations. You can also apply strikethrough font and change font 
case. When you double-click the Pointer Tool on text notations added by the text tool, 
the Text Tool Font dialog box pops up and you can modify those text notations through 


that dialog box. 
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To open Text Tool Font dialog box, from the menus choose: 


Text Tool Font... 


Panning and Zooming 


The Graph Editing toolbar provides tools for panning and zooming. These tools let you 
pan the graphs, zoom in/out the graphs, and zoom a selected portion. 


Panning. This tool ( ) allows you to move the graph to different portions in the 
Graph Editor. The advantage of moving the graph with the help of a Pan tool rather 
than a Pointer tool is that the graph need not be selected and panning is possible even 
when the graph is zoomed. 


Zoom In. This tool Q&) can be used to zoom in your graph. 
Zoom Out. This tool ( & ) can be used to zoom out your graph. 
Zoom Selection. This tool ( 4) allows you to zoom a selected portion of the graph. 


Resetting Graphs and Tooltips 


Reset Graph. You can use this tool ( s to bring back the graph to the editing mode. 
You can use this tool to bring your graph to the default size and position after zooming. 


Graph Tooltips. You can use this tool ( Ыш) to view/hide the tooltip on the graph. 
The tooltip is displayed in the status bar when this option is selected. 


Levels in the Graph Editor 


The Graphs in the Graph Editor have two levels: the frame level and the pane or graph 
view level. Frames are the structures that surround each individual graph. The pane or 
graph view is the complement of the Frames in the Graph Editor. 
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Very often, you get multiple frames encased in a pane when you use a grouping 
variable. For example, if you created a graph of rain*summer winter, you get a graph 
with two frames, one that contains the rain *summer graph and the other that contains 
the rain*winter graph. Single page mode is another case where you end up with 
multiple frames encased in a pane. By contrast, if you use the Overlay multiple graphs 
into a single frame feature, you get many graphs in a single frame and therefore also in 
a single pane. 


In Graph View, you can edit individual frames of a multi-frame graph, whether these 
frames are placed side by side or overlaid in single page mode. You can pull apart the 
overlaid frames layer by layer to view and edit each 1 separately. To return the frames to 
their original alignment, choose Realign Frames ( Па from Graph Editing toolbar, 
or from the menus choose: 
Graph 

Realign Frames 
Keep in mind that overlaying frames and overlaying multiple graphs into a single 
frame yield differing frame structures. In single page mode, each graph corresponds to 
a unique frame. However, when multiple graphs are overlaid into a single frame, you 
cannot pull apart the graphs for editing because one frame contains all graphs. 
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Interactivity in Graph Editor using Graph Properties dialog box 


Clicking twice in the Graph Editor opens the Graph Properties dialog box. 
Alternatively, you can also right-click in the Graph Editor, open a menu with the item 
"Properties' at the top and click 'Properties' to open Graph Properties dialog box. 
Through the Graph Properties dialog box you can modify features of a graph, frame, 
axis, legend and element. Once you press the OK button, the changes are committed 
and so applied permanently onto the graph. If the Cancel button is pressed, changes are 
rolled back and all the uncommitted alterations are cancelled. Unless the OK or Cancel 
button is pressed, the Graph Properties dialog box remains available even when focus 
is shifted from the Graph Editor. But selecting a node from Output Organizer commits 
the changes to the current graph and closes the dialog box. If the dialog box is 
conveniently placed somewhere, within that session, it always opens from that 
position. Graphs from output view in a .syo file can also be edited using the Graph 
Properties dialog box after double-clicking on that graph and invoking the Graph 
Editor. 

The Graph Properties dialog box does not open for stacked bar, SPLOM, and dual 
display. In case of Quick Graphs, SYSTAT allows aesthetic changes such as changing 
font, color, fill pattern, symbol, style. 
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Coloring Scheme 


SYSTAT allows you to set color for fonts and fill inside a symbol, symbol boundary, 
tick marks, axes lines, and elements by choosing a color from the color palette that 
pops up after pressing the corresponding color button. In the Color Palette, apart from 
the 48 predefined colors, you can access more than 16 million colors using Define 
Custom Colors. To select a color, specify the RGB (Red-Green-Blue) or Hue-Sat-Lum 
(Hue-Saturation-Luminosity) values or alternatively choose the desired color by 
clicking the mouse in the square palette of multiple colors and using the right hand side 
slider to adjust the shading, and then pressing Add to Custom Colors. 


Color 

Basic colors: 

EI SER 
mi SSeS 
Le 
SSS See 88 


EHEHNHEH-N 


Custom colors: 
NI шшш шт ш 
BRE BS ш а 


Define Custom Color 


Levels of Interactivity 


The Graph Properties dialog box has five groups which are the levels of interactivity. 
These are: 


Graph 
Frame 
Axes 

Legend 


Element 
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Under every group, there are a number of tabs. Settings of tabs are designed to achieve 
the desired look for the graph. From the list of a box, you can apply the available 
options one by one by moving up or down the list using the corresponding up or down 
arrow keys in the keyboard. If any invalid value is entered in a box, it turns pink to set 
an alarm for the occurrence of an error and you cannot move to any other tab or group 
unless a valid value is entered. 

Depending on the type of graph, some tabs under a group and in some cases even a 
whole group may remain disabled or invisible. For example, if you are modifying a 
graph with only a bar chart without any option, then the Legend group and Font tab 
under the Frame group remain disabled. 

No Log is generated for the changes applied through the Graph Properties dialog box. 


Descriptions of the tabs under each of the five groups are given below. 


Graph | Wi 


Graph type: 
Coordinate system:| Rectangular м 


the graph title always remains visible below the group icons. For 
default titles of 'Graph1', 'Graph2’, etc. are assigned. If you have 
gle display (e.g.. several graphs drawn within a BEGIN END 


In the Graph group, 
graphs without a title, 
multiple graphs in a sin 
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command), all the titles appear in a list. In such a case, the list also contains an option 
‘All’. Selection of 'All' allows you to apply feature changes to all the graphs in the 
display. Selecting a particular graph title enables that graph as the current graph on to 
which changes can be applied. You can also select a graph by directly clicking on it. 


Tabs in the Graph group are: Options, Font, Layout, Zoom /Rotate. 
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Options 


Graph Properties 


SYSTAT Graph 


[V] Title: 


gi 


(Background color — xe] 
coho 
Coordinate system:| Rectangular | 


| Projection type: | А Т 


Title. Select to edit title of the graph in the adjacent box. To enter a multi-line title of 
the graph, press the ‘Enter’ key to go to the next line of the text box. To commit the title 
change, press the 'Tab' key. You can use escape sequences like \H for double height and 
\> & \< for superscript and subscript. Please see the help file of the WRITE command 
for further details. For a graph without a specified title, this box remains empty 


although a default title gets assigned to it. 


elect the box and click the adjacent color button to choose a dense 


Background color. S 
color for the graph background. The graph background is transparent and colorless if 


the Background color check box is clear. 
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Graph type. Change the graph to a similar type of display by clicking on an option in 
the list. 
Similar graph types are clubbed below: 


W Баг, dot, line, profile, pyramid chart 
ж histogram, gap histogram, frequency polygon, fuzzygram 
ш dot histogram, symmetrical dot density, jittered dot density, density stripes 
Ww normal curve, kernel curve 
Coordinate system. Select from the list of relevant coordinate types. For graphs like 
map, scatterplot, fplot, icon plot with icon location, you can apply projection. 
Projection Type. When Projection is selected in Coordinate system you can select a 
projection transformation for the graph from the list. SYSTAT provides the following 
projection options: 

‘Oblique gnomonic', 'Oblique stereographic’, 'Oblique orthographic’, 'Mercator 
conformal', 'Lambert cylindrical’, 'Miller cylindrical', 'Robinson', 'Peters', ‘Sinusoidal’, 
'Fisheye'. 
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Font 


Graph Properties 


All font-related changes under this option are applicable only to Graph Title and not to 
other texts in the graph like Axis title, Tick labels, etc. 


Font. Select the font type for graph title. 


Style. Select the font style for graph title. 

Size. Type or select a font size for graph title. Font size may vary from 1 to 72. 
Rotation. Type or select an angle (in degrees) to rotate the graph title. Rotation angle 
may vary from -360 to 360. 

Color. Click the color button and choose a color to set font color for the graph title. 
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Background color. Select the box and click the adjacent color button to choose a dense 
color for the graph title background. The graph title background is transparent and 
colorless if the Background color check box is clear. 


Strikethrough. Select to apply the strikethrough font to the graph title. (For Example 
Test becomes Test). 


Case. Select an option to modify the case of the graph title text. Select upper or lower 
case to convert the text entirely to upper or lower case. Current which is the default 
option means that text appears in the same case that you have input it in. 


Layout 


Graph Properties 


Location from origin 


Horizontal: 


[0 


Size 


|| Height: 


width: [2 


Altitude: 


Vertical: 


Multiple frames 


inm | 


2 


Slope adjustment 


Columns: 
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Location from origin. You can set the position of the graph on the Page View window 
by typing or selecting values in Horizontal and Vertical boxes as x-axis and y-axis 
coordinates, respectively. The default is the location of the selected graph. Refer to 
Layout page description in Chapter 9: Options and Features for Graphs. 

Size. In the Height and Width and Altitude ( enabled for 3-D displays) boxes, type values 
to set dimensions for the graph with the axis title taken into account and excluding 
graph title. 

Slope adjustment. Select to automatically adjust scales of graphs. It adjusts the height 
and width of the graph so that the median absolute physical slope is 1. 


Multiple frames. In case of multiple graphs, specify the number of rows or columns of 
graphs to appear on the display. By default, SYSTAT arranges the individual displays 
to maximize their size. The maximum number of rows or columns is the maximum 


number of graphs in the current display. 
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Zoom/Rotate 


Graph Properties 


X/Y/Z . Move the slider to zoom from 0% to 200% in the available X/Y/Z directions 
as applicable. The default size is 100%. A scale of 0% to 200% is provided below the 
Z direction slider to figure out the level of the current zoom status. Place the cursor on 
the slider of a direction to find the amount of current zoom along that direction. 


Slide all. Select to move the sliders in all the available directions in unison at the same 
level of zoom, so that the graph is zoomed uniformly from all possible directions. 


Press Reset button ( ) to cancel all previous uncommitted zoom operations; the 
graph then reverts to original default size. 


Note: 

ш Sliders remain disabled if an individual frame of a multi-frame graph is zoomed 
previously. 

ш Zoom operations automatically update the values in Size in Layout tab of the Graph 


group. 
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Rotate. You can rotate 3-D graphs. A graph with a four-variable triangular coordinate 
can also be rotated. 

Click either of four rotation buttons to change viewing angle accordingly. 

Select Auto to rotate the graph automatically. 

Select Using mouse to rotate the graph by moving the mouse. 

Select None to stop rotation. 

Press Reset button ( ) to cancel all previous uncommitted rotation operations. 
The graph then reverts to original default orientation. 
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Frame 


Graph Properties 


Tabs in Frame group are: Options, Font, Zoom/Rotate, All Axes, Layout . 
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Options 


Graph Properties 


[Background color 
Graph type: 


Title. In a multi-frame graph, where the frames are generated by grouping variable(s), 
select this box to edit the title for the current frame in the adjacent box. To enter a multi- 
line title, press the 'Enter' key to go to the next line of the text box. Press the 'Tab' key 
to commit the title of the frame. You can use escape sequences like \Н for double 
height, V» & \< for superscript and subscript. Please see the help file of the WRITE 
command for further details. 

If multiple frames are generated by specifying multiple variables from a single graph 
command, as in BAR SALBEG SALNOW*JOBCAT, then you are not allowed to add 


frame title. 
For a single frame graph, it remains disabled. In this case, edit the title from 


Graph => Options tab. 
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Background color. Select the box and click the adjacent color button to choose a dense 
color for the frame background. The frame background is transparent and colorless if 
the Background color check box is clear. 

Graph type. Change the graph to a similar type of display by clicking on an option in 
the list. 

Similar graph types are clubbed below: 


m Баг, dot, line, profile, pyramid chart 

W histogram, gap histogram, frequency polygon, fuzzygram 

ш dot histogram, symmetrical dot density, jittered dot density, density stripes 
ш normal curve, kernel curve 


Coordinate system. Select from the list of relevant coordinate types. For graphs like 
map, scatterplot, fplot, icon plot with icon location, you can apply projection. 


Projection Type. When Projection is selected in Coordinate system you can select a 
projection transformation for the graph from the list. SYSTAT provides the following 
projection options: 

‘Oblique gnomonic', 'Oblique stereographic’, 'Oblique orthographic’, 'Mercator 
conformal’, 'Lambert cylindrical’, 'Miller cylindrical’, 'Robinson', 'Peters', 'Sinusoidal', 
‘Fisheye’. 
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Font 


Arial Black ГА 


Style: | Regular я 
size: |9 | Rotation: 
= 


Case: |Current 


All font-related changes under this tab are applicable only to Frame Title and not to 
other text like Graph Title, Axis title, Tick labels, etc. 


Font. Select the font type for frame title. 


Style. Select the font style for frame title. 


Size. Type or select a font size for the frame title. Font size may vary from | to 72. 


Rotation. Type or select an angle (in degrees) to rotate the frame title. Rotation angle 


may vary from -360 to 360. 


Color. Click the color button and choose a color to set font color for the frame title. 
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Background color. Select the box and click the adjacent color button to choose a dense 
color for the frame title background. The frame title background is transparent and 
colorless if the Background color check box is clear. 

Strikethrough. Select to apply strikethrough font to the frame title. (For Example Test 
becomes Test). 


Case. Select an option to modify the case of the frame title text. Select upper or lower 
case to convert the text entirely to upper or lower case. Current, which is the default 
option, means that text appears in the same case that you have input it in. 


Layout 


Graph Properties 


Location from origin 
Horizontal: Vertical: 
5.18954 MI 2.05076 I| 


5ге 
| Height: 
Width: 


Columns 


Altitude: 


Location from origin. You can set the position of the frame on the Page View window 
by typing or selecting values in Horizontal and Vertical boxes as x-axis and y-axis 
coordinates, respectively. The default is the location of the selected frame. 
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Size. In the Height and Width and Altitude (enabled for 3-D displays) boxes, type values 
to set dimensions for the frame with the axis title and frame title taken into account. 


Slope adjustment. Select to automatically adjust scales of frames. It adjusts the height 
and width of the frame so that the median absolute physical slope is 1. 


Zoom/Rotate 


Graph Properties 


X/Y/Z . Move the slider to zoom from 0% to 200% in the available X/Y: /Z directions 
as it is applicable. The default size is 10096. A scale of 0% to 200% is provided below 
the Z direction slider to figure out the level of the current zoom status. Place the cursor 
on the slider of a direction to find the amount of current zoom along that direction. 


Slide all. Select to move the sliders in all the available directions in unison at the same 
level of zoom, so that the graph is zoomed uniformly from all possible directions. 
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Press the Reset button ( ) to cancel all previous uncommitted zoom operations; 
the frame then reverts to tne original default size. 


Rotate. You can rotate 3-D frames. A frame with a four-variable triangular coordinate 
can also be rotated. 

Click either of four rotation buttons to change the viewing angle accordingly. 

Select Auto to rotate the frame automatically. 

Select Using mouse to rotate the frame by moving the mouse. 

Select None to stop rotation. 

Press Reset button ( ) to cancel all previous uncommitted rotation operations. 
The frame then reverts to the original default orientation. 


All Axes 


Graph Properties 


Tick mark style: 


Tick mark location 
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2-D Axes 
Box Vertical Horizontal None 


Гаи 


zT 
ii 


Axes type. Select the axis configuration from the list. 


Scale type. Select the scales to display from the configurations in the list. 

Except for the cross option, the choice of axes (scale) type is independent of the scale 
(axes) type displayed. If the axes (scale) type is cross, the possible scale (axes) types 
are Cross and None. 

Tick mark style. From the list, select your choice for the tick mark style. The tick 
marks can be placed inside, outside, or through the axes. 

Tick mark location. The available options are Flush, Indent, Float. The default position 
for the first and last tick marks is Flush with the maximum and minimum. To avoid 
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overlapping of the first and last tick marks on the various axes, select Indent. Float 
allows SYSTAT to select "nice" scale numbers. 
With axes or scale as Cross, the Indent option is ignored. 


Transpose X- and Y- axes. Select it to swap the position of the x and y axis. The 
display rotates by 90 degrees. 


Tick mark intervals 
Labeled (Tick): 


| Unlabeled (Pip): 


Tabs in Axis group are: Display, Font, Options, Line. Depending on the type of item 
chosen from Axes type or Scale type in the Frame => All Axes tab, the relevant axes 
are shown in a list above the tabs. 
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For example, in a three dimensional graph, if the chosen axes or scale type is Box, then 
the list of axes shown under Axis group contains Bottom front, Bottom right, Bottom 
rear, Bottom left, Top rear, Top right, Top rear, Top left, Top front, Front right, Front left, 
Rear right, Rear left. 

Choose an item from the list to apply changes onto that axis. 

For Pie chart, the Axis group remains disabled. 


Display 


Graph Properties 


Label: [AGE m | 


Minimum: [20 


maman — | 
Tick mark intervals 
Labeled (Tick): 


Unlabeled (Pip): 


axis. The default label is the name of the variable 
default name that appears for a particular chart, 


Label. You can edit the label of the 
specified in that axis, or a count, or any 
or the label specified for the data variable. 

maximum / minimum value to appear on the axis. Any 


data value outside these limits does not appear on the display. 


Minimum/Maximum. Type th 
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Reverse scale. Select this box to arrange the values of ticks in reverse order, from 
maximum to minimum for the selected axis. 


Decimals. Type or select the number of digits (0 to 14) to be displayed after the 
decimal in the labels on each tick mark. 
Tick mark intervals. 


= Labeled (Tick). Type or select a value in the box to determine the number of 
intervals to be marked by ticks between the minimum and maximum values on the 
axis. 


= Unlabeled (Pip). Type or select a value in the box to determine the number of 
intervals to be marked between adjacent tick marks, 


Font 


Graph Properties 


агг 
ci — 
El Background color ax || 


El Strikethrough 


= [ore Сна 


All font-related changes under this tab are applicable only to the respective axis and 
not other text like the Graph Title, Frame Title, or labels applied to elements. 


EZ 
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Apply to. This box contains two options viz. Axes label, Tick label. Select an option to 
apply subsequent font changes onto that. 


Font. Select the font type for the Axes label or Tick label selected from the list of the 
Apply to box. 


Style. Select the font style for Axes label or Tick label selected from the list of the 
Apply to box. 


Size. Type or select the font size for Axes label or Tick label selected from the list of 
the Apply to box. Font size may vary from | to 72. 


Rotation. Type or select an angle (in degrees) to rotate the Axes label or Tick label 
selected from the list of the Apply to box. Rotation angle may vary from -360 to 360. 
Color. Click the color button and choose a color to set the font color of the Axes label 
or Tick label selected from the list of the Apply to box. 

Background color. Select the box and click the adjacent color button to choose a dense 
color for the background of the Axes label or Tick label selected from the list of the 
Apply to box. The background is transparent and colorless if the Background color 
check box is clear. 


Strikethrough. Select to apply strikethrough font to the Axes label or Tick label 
selected from the list of the Apply to box. (For example, Test becomes Fest). 


Case. Select an option to modify the case of the Axes label or Tick label text. Select 
upper or lower case to convert the text entirely to upper or lower case. Current, which 
is the default option, means that text appears in the same case that you have input it in. 
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Options 


Graph Properties 


Bottom | 
| Display Font | Options | Line | 
Transform: |LOG M 


Base: Power: | 1 


Limit lines 


Transform. From the list you can select Log or Power to apply the corresponding 
transformation along that axis. None, which is the default option, means that no 
transformation is done on the axis. 

Base. If the Log option is selected in the Transform box, type or select a value between 
2 and 10 (both inclusive) to designate the base value of the logarithm. The default value 
is 10. 


Power. If the Power option is selected in the Transform box, type or select a value 
between -3 and +3 (both inclusive). The default value is 1. 


Lower/Upper. In the boxes of Limit lines, type values within the axis range to display 
lines perpendicular to the values in the axis. 
The value in the Upper box should be more than the value entered in the Lower box. 
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Display grid line. Select to display grid lines perpendicular to the axis at each tick 
mark. 


Graph Properties 


Apply to. Select an item from the list to apply subsequent aesthetic changes to that line. 
The list contains Axis line, tick and also Grid, Limit lower, Limit upper in cases when they 


are applicable. 

Style. Select line style like solid line, dotted line, dashed line, etc. for the item chosen 
from the list of the Apply to box above. 

Color. Click the color button to select a color for the item chosen from the list of the 
Apply to box above. 

Width. You can type or select a value between 0 to 10 to specify the width or thickness 
of the item chosen from the list of the Apply to box above. 
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Legend € 


Graph Properties 


Title: POP. 1983 


|| Label: 
Change to: 


Location 
Horizontal: 


The Legend group is enabled when the graph has at least one legend. In the Legend 


group, you get all types of legends that appear in a graph in a list below the group icons. 


Select a type of legend to change its options. If there is no legend, the Legend group 
remains disabled. The tab under the Legend group is Options. The legend group 
remains disabled for the legend generated by the 'TILE' option. 
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Options 


Graph Properties 


Display legend 
Title:| EDLEVEL 


Label: 


Change to: 
Location 
Horizontal: 
0 


Display legend. Clear the box to stop display of that legend type. 
By default, the box is selected. 


Title. This box contains the default title of the legend selected. You can edit the text of 
the title. 

Label and Change to. The Label box lists all the labels of the selected legend. You can 
edit the name of the chosen label from the Change to box. 

Location. You can set the position of the frame on the Page View window by typing or 
selecting values in Horizontal and Vertical boxes as x-axis and y-axis coordinates, 
respectively. The default is the location of the selected frame. 

Layout. Arrange the labels of the selected legend in the desired number of rows and 
columns. By default, SYSTAT arranges the legend labels ina column. The maximum 
number of rows or columns is the number of labels in the legend. 
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Element 


Graph Properties 


| [V] Anchor bars at: 
| C Connect dots in left-to-right order 
Bar width: o5 B 

|| [TI Display labels 


You can modify the features of a constituent element (e.g., bar in bar charts, symbol in 
dot charts, plots, etc.) ina graphical display from the Element group. To apply aesthetic 
changes onto a particular element, you need to fire the Graph Properties dialog box 
from that element. In that case selecting 'Apply all' affects changes onto all the similar 
elements of the graph. 

Tabs that appear in the Element group depend on the type of graphical display. 
Besides that, depending on the type of graph, some tabs may remain disabled. 
Examples of some tabs in the Element group are: Display, Options, Error Bars, Font. 
Fill. 

Different graph types have different Options tabs. Dot histogram, Symmetrical dot 
density, Jittered dot density, Density stripes, Andrew's Fourier Plot, Parallel 
Coordinate Display, Icon Plot, FPLOT, Map does not have such an Options tab. 
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Options (For Bar, Dot, Line, Profile and Pyramid display) 


Graph Properties 


Elus 
Anchor bars at: |57 


Connect dots in left-to-r ight order 


Bar width: 0.5 
Display labels 


Size: 


Percentage chart. Select to plot values as a percentage of the sum. 


Median instead of the mean for height. For two-dimensional charts involving a y- 
variable or three-dimensional charts involving a z-variable, select to display the 
median of the y or z variable respectively, instead of the mean. 


Anchor bars at. Select the box and type a value at the adjacent box to compare the 
variable with respect to that level. The elements reach up or down relative to the value 


you enter in the adjacent edit box. For line, dot, profile charts, it remains disabled. 


Connect dots in left-to-right order. It connects the dots with a line, in a sequence from 
left to right. For bar, pyramid, profile charts, it remains disabled. 

Bar width: Type or select a value to change the thickness of elements. The valid range 
is 0 to 1. The default is 0.50. For line, dot, profile charts, it remains disabled. 
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Display Labels. Select to display the exact value represented by each bar or pyramid in 
two dimensional charts. 


Size: When Display Labels is selected, you can type or select a value to specify the font 
size for the label. The valid range is 0 to 10 (both exclusive). 
Note: 


m Three dimensional bar and pyramid graphs do not show labels. So, in these cases, 
the Display Labels check box and the Size box remain disabled. 


= For dot, profile and line displays, the Display Labels check box and the Size box 
remain disabled. 


Options (For Histogram, Gap histogram, Frequency polygon, Fuzzygram, Density 
Function chart) 


Graph Properties 


Cumulative frequency. Select to set each bar's area as the sum of the preceding bar's 
area and its own incremental area. 
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Number of bars. Type or select a number between 1 and 200 to specify the number of 
bars (intervals or bins) to be displayed. The size of each interval depends on the range 
of the data. For the density function chart, it remains disabled. 


Bar width. Type a value to specify the width of each interval. The actual number of 
bars depends on the range ofthe data. If you also specify the number of bars, this option 
takes precedence. For the density function chart, it remains disabled. 


Tension. This box is active only when the kernel option is chosen in the density 
function chart display. Here you can type or select the degree to which the line or 
surface is allowed to flex locally to fit the data. Specify a value between O and 1. Lower 
values allow a greater degree of local flex. 
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Options (For Box plot display) 


Graph Properties 


Options бутбой 
[Г] Add notches to mark confidence intervals 
[Г] with symmetrical dot density 


Add notches to mark confidence intervals. Select to notch (to narrow) the boxes at 
the medium. The boxes return to full width at the lower and upper confidence interval 
values. 


With symmetrical dot density. Select to overlay a box plot and a dot density plot in 
the same display. 
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Options (For Pie chart display) 


Graph Properties 


Options | Font |. Fil. | 


Separate slice from pie: 


Attention map (Ring) 
М0 
Transform: |LOG М 
Base; | 10 М воме: Т] 
[Z] Display slice labels 
Size: v 


С) 


Separate slice from pie. Select this box and type or select a number in the adjacent 
box to separate that category slice. The maximum value shown in the list of the box is 
the number of levels ofthe category variable. Any value specified which is greater than 


that maximum is an invalid entry. 
Attention map (Ring). Select to draw the ring plot, or attention map which is a set of 


concentric rings beginning with a smallest ring for the first category. The radius of each 
ring is the sum ofthe previous radii plus the amount due to the corresponding category. 


Display scale. Select to display scales of the category levels. 


Transform. From the list you can select Log or Power to apply the corresponding 
transformation to the value of the slice variable. None, which is the default option, 
means that no transformation is done on the slice variable. 
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Base. If the Log option is selected in the Transform box, type or select a number 
between 2 and 10 ( both inclusive) to designate the base value of the logarithm. The 
default value is 10. 


Power. If the Power option is selected in the Transform box, type or select a number 
between -3 and +3 (both inclusive) for the base value. The default value is 1. 


Display slice labels. Select to display the percentage represented by each slice. 


Size. When Display slice labels is selected, you can type or select a value to specify the 
font size for the slice label. 


Options (For Scatterplot, PPLOT, QPLOT, HILO plot) 


Graph Properties 


| Confidence ellipse: 


p: 
[confidence kernel: [0.1 


[Г] Convex hull 
E Influence on correlation coefficient 


[Use hexagonal binning 


Number of hex grid cuts: | 25 
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Confidence ellipse. You can draw Gaussian bivariate ellipses for the sample in each 
plot or Gaussian bivariate confidence intervals on the centroid. 


= Sample. Select to center the ellipse on the sample means of the x and y variables. 
The unbiased sample standard deviations of x and y determine its major axes and 
the sample covariance between x and y, its orientation, You can type or select the 
size of the ellipse by specifying a probability value between 0 and 1 (both 
exclusive) in the p box. The default is 0.6827. 

m Centroid. Select to center the ellipse on the sample means of the x and y variables. 
The unbiased sample standard deviations of x and y determines its major axes and 
the sample Pearson correlation between x and y determines its orientation. You can 
type or select the size of the ellipse by specifying a probability value between 0 and 
1 in the p box. This size is adjusted by the sample size so that the ellipse is always 
smaller than that produced using Sample. The default is 0.95. 


m None. means no ellipse is drawn. 
Confidence kernel. Select the box and type or select a probability value between 0 and 


1 (both exclusive) in the adjacent box to display the concentration of data in the sample 
through nonparametric kernel density estimation. 


Convex hull. Select to draw a convex hull around all the points in the scatterplot. 


Influence on correlation coefficient. Select to make the size of the plot symbol 
represent the extent of influence each point exerts on the Pearson correlation 
coefficient. A scale in the form of a legend to the right of the plot helps you judge the 
extent of influence. (The influence of a point is the amount the correlation would 


change if that point were deleted.) 

Overlapping data. Options in the list distinguish overlapping data. 

m Jitter. Select to add slight random jitter to distinguish points. 

ш Sunflower. Select to draw shaded sunflower symbols. Sunflowers are lighter or 
darker depending on the number of cases. Nine such shades in the symbol are 
possible. Larger counts in overlapping data are plotted with filled circles. 

m Overlap. Select to display overlapping data as a single point. This is the default 
option. 

Use hexagonal binning. Select to apply hexagonal binning for the 2-D plot. If the 

Global option "Threshold for hexagonal binning' is applicable for the data or hexagonal 

binning option is specified in command, this box is automatically selected. 


456 
Chapter 10 


Number of hex grid cuts. If the 'Use hexagonal binning! box is selected, you can type 
the number of cuts in the grid of hexagonal binning from this edit box. The maximum 
number of cuts is 50 and the minimum is 2. If the Global option "Threshold for 
hexagonal binning’ is applicable for the data, the default value 25 appears in this box. 
If hexagonal binning option is specified in the command, then that value appears in the 
box. Hexagonal binning does not apply to the HILO plot. 


Lines (For Scatterplot, PPLOT, QPLOT, HILO plot) 


Graph Properties 


Line connected in case order. Select to connect points in case order. 


Traveling salesman path. Select to find the shortest possible closed path that connects 
all the points with no repetitions. 
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Minimum spanning tree. Select to connect all the points to minimize the sum of the 
lengths of the connecting line segments. Note that no closed region is formed in this 
case. 


Vertical spikes to Y. Select the box and type a value in the adjacent box to draw spikes 
from that specified value of the y-axis in 2-D display and of the xy plane in 3-D display. 


Vector lines. Select to connect each point to a single point that you specify in the 
adjacent boxes for the relevant axes. 


Delaunay triangulation. It partitions the non-triangular polygons of a Voronoi 
tessellation into triangles by joining the vertices of the Voronoi polygons. 


Voronoi tessellation. Select to produce straight boundaries halfway between points. 
This option presumes you have equivalent distances on the x and y axis. This is also 
known as the Dirichlet tessellation or the Thiessen diagram. 
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Smoother (For Scatterplot, PPLOT, QPLOT, HILO plot) 


Graph Properties 


E иті to data range 


[confidence interval (linear): [ 0.95 
г Residuals 


Tension: 


Method. Select one from the list of 19 items to fit a smoother. The items are: 

‘Linear’, 'Quadratic', 'Log', 'Power', 'LOWESS', 'DWLS', 'Spline', 'Step', 'NEXPO', 
‘Inverse’, 'Mean’, 'Median', 'Mode', 'Midrange', 'Andrews', 'Bisquare', 'Huber', ‘Trimmed’, 
'Kriging'. 

None means no smoother is applied. 

When the 'Kriging' smoother is selected, you can select 'Angle', 'Order', 'Ratio' and type 
the corresponding values in the newly enabled box. For further details of smoother 
methods, refer to the Smoother tab description in Chapter 5: Scatterplots. 


Tension. For the LOWESS, DWLS, inverse, mode, median, mean, and trimmed 
smoothing methods you can specify the degree to which the line or surface is allowed 
to flex locally to fit the data. Specify a value between 0 and 1. The default value is 0.5, 
meaning that half the points are included in the running window. You can increase or 
decrease this value to increase or decrease the number of points in the window. 
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Limit to data range. Select the check box to limit the smoother domain to the data 
range. 


Confidence interval (linear). Select to get a confidence interval on the regression line 
for two-dimensional displays. Type or select a value of the confidence bands (in 0 to 
1; both exclusive) in the adjacent box. The default is 0.95 (9596 confidence). 


Residuals. You can plot the standardized residuals of the dependent variable. 
SYSTAT derives residuals for any of the 19 smoothers described. 
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Surface (For Density polygon, Density Function, Scatterplot, FPLOT ) 


| Number of grid cuts: 11 м) 


— 


| wee 


Type. From the list, select an item to specify the direction, color, or pattern of the cutting 

surface in a 3-D graph. 

m X/Y/Z Cut Lines. Select to cut the surface with planes perpendicular to the x/y/z- 
axis. 

= XY Cut Lines. Select to cut the surface from two directions with planes 
perpendicular to x and y axis. 

When a cut line is selected, the Gradient, Wireframe boxes are disabled. 


ш Colored Solid. Select to apply multiple colors to the surface. 

ш Patterned Solid. Select to display patterns in black and white. Gradient, Wireframe 
boxes are disabled in this case. 

ш Shaded Solid. Select to produce shades of one color. This is the default option. 


For 2-D mosaic of Scatterplot, this box remains disabled. 
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Number of grid cuts. Type or select the number of cuts in the grid. 

For function plot, default number of grid cuts is 35 x 35. 

For 3-D kernel and normal density plots, default number of of grid cuts is 25 x 25. 
For smoother surface in 3-D scatter plot, default number of of grid cuts is 11 x 11. 
The maximum number of cuts is 60 and the minimum is 2. For a 2-D mosaic of 
Scatterplot, this box remains disabled. 


Gradient. This box is enabled when Colored solid item is selected from Type box. Then 
you can choose an item from the list to apply pattern of colors in gradient to the surface. 
When the Gradient item is selected from the Wireframe box, select an item from the 
Gradient box to apply the corresponding color in gradient to the wireframe without any 
fill. 


Wireframe. Surface with Colored solid option from the Type box appears with a 
wireframe. Select an item from the list to apply that color to the wireframe. None 
means no wireframe is shown. Select the Gradient item to color the wireframe without 
any fill. In this case, the color of the wireframe is set by your choice of an item in the 
Gradient box. 
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Error Bars (For Bar, Dot, Pyramid, Profile and Line displays) 


Graph Properties 


SYSTAT provides conventional error bars to indicate the variability of the measure 

displayed. 

Туре. Select type of error bars to be displayed. 

m Standard error. Select to show standard error bars. Type or select a value between 
0 and 1 for the confidence level in the adjacent p box. For one standard error of the 
mean (the default), the p value is 0.6827. 


ш Standard deviation. Select to show standard error bars in standard deviation units. 
Type or select a value between 0 and 1 for the confidence level in the p box. 


Width. Type or select a value between 0 and 1 to specify the length of horizontal ticks 
that bound the error bar when the Line with ticks item is selected from the Style box or 
width of the error box when the Box item is selected from the Style box. So a value of 
0.5 means the length (or width) is 5096 of what a bar would be. 
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Style. You can select style for the error bars. 


" = with ticks. Select to display error bars as vertical lines bounded by horizontal 
ticks. 


= Box. Select to display error bars as vertical boxes. 


Font 


Graph Properties 


Е Background color 


El Strikethrough 


Current — M Парру all 


This tab remains enabled when the permissible Label is specified for the elements of 
the chart. For example, labels specified inapie or in atwo dimensional bar or pyramid 
to display their exact numeric value, or the string variable labels of maps and plots are 


edited from here. 

Font. Select the font type for the label specified. 

Style. Select the font style for the label specified. 

Size. Type or select the font size for the label specified. Font size may vary from 1 to 72. 
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Rotation. Type or select an angle (in degrees) to rotate the specified label. Rotation 
angle may vary from -90 to 90. 


Color. Click the color button and choose a color to set the font color for the specified 
label. 


Background color. Select the box and click the adjacent color button to choose a dense 
color for the background of the specified label. The background of the specified label 
is transparent and colorless if the Background color check box is clear. 


Strikethrough. Select to apply the strikethrough font to the label specified. (For 
Example Test becomes Test). 


Case. Select an option to modify the case of the label text. Select upper or lower case 
to convert the text entirely to the upper or lower case. Current, which is the default 
option, means that text appears in the same case that you have input it in. 
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Fill (For Bar, Profile, Pyramid, Pie, Histogram, Map) 


Graph Properties 


i [Pattern А 


: |Color м 


Style. Select an item to choose the way you want to specify the fill style inside the element. 

m Pattern. Select this item and choose a predefined pattern listed in the adjacent box. 

m Value. Select this item and type or select a value between 0 and 1 in the adjacent 
box to apply the corresponding fill style. 

When you type a number between 0 and 1, SYSTAT chooses the shade of gray nearest 

to that specified number. 

Color. Select an item to choose the way you want to fill elements with color. 

m Color. Select this item and click the color button to choose a color to fill element 


with that. 
m Value. Select this item and type or select a number to apply the corresponding 
color to fill element. 
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Boundary. You can change style, width, color of the boundary of an element in the 
chart. From the Style box, select the line style of the boundary like solid line, dotted 
line, dashed line, etc. From the Width box, type or select a value between 0 and 10 for 
the width or thickness of the boundary. From the Color box, select a color for the 


boundary. 


Symbol (For Dot chart, Dot Density, Scatterplot, PPLOT, QPLOT, SPLOM, HILO 
Plot, Density Box, Icon Plot) 


Graph Properties 


From the Symbol tab, you can choose from a variety of distinctive symbols to mark the 
points on a graphical display like Dot chart, Dot Density, Scatterplot, PPLOT, QPLOT, 
HILO Plot, Density Box, Icon Plot. 

Type. Select an item to choose the type of symbol you want, 


= Symbol. Select this item and choose the symbol from the list in the adjacent box. 
SYSTAT has 23 built-in symbols. 
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m Character. Select it and type a character in the adjacent box from the keyboard to 
set that as the symbol. 


Size. Type or select a number from 0 to 49 for the size of element. 


Fill. When the Symbol item is chosen from the Type box, you can fill symbols that have 
some bounded region with various patterns and also add color to that pattern. 
Boundary. When the Symbol item is chosen from the Type box, you can edit the 
boundary of the symbols. 

From the Style box you can select line style like solid line, dotted line, dashed line, etc. 
From the Color button you can choose a color for the boundary. 

In the Width box you can type or select a number between 0 to 10 to specify the thickness 
of the symbol boundary width. 


Examples 


Example 1 
Selecting Cases 


The following scatterplot contains the projected population for 57 countries against 
their populations in 1983. 


The input is: 


USE OURWORLD 
PLOT POP_2020 * POP_1983 
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We сап use the selection tools to zoom in on the cases that cluster together in the lower 
left corner. When we select cases with the Region selection or Lasso selection tools, 


the selected cases are also identified in the Data Editor. Here, we use the Region 
selection tool. 
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When we run the scatterplot procedure again, only the selected cases are plotted, 
providing a more detailed view of the cases that were all clumped together in the 


original scatterplot. 
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Example 2 
Converting a Bar Chart to a Profile Chart 


Here is a bar chart displaying the number of people in each education level. 


The input is : 


USE SURVEY2 

RECODE EDUCATN$ = EDUCATN/1,2-'HS dropout' 3='HS grad', 
4-'Some college', 
5-'College grad', 
6,7='Degree +' 

ORDER EDUCATN$ / SORT- 'HS dropout', 'HS grad', 'Some, 

college', 'College grad', 'Degree +' 
BAR EDUCATNS$ / XLAB='EDUCATION' 
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The output is: 


100. T T T Y. Graph Properties 
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EDUCATION 


From the Graph Properties dialog box Graph => Options tab, we can convert the chart 
to a profile chart. 
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Example 3 
Converting a Bar Chart from Rectangular Coordinates to Polar Coordinates 


Here is a bar chart in rectangular coordinates. 
The input is : 


USE IRIS 
BAR SPECIES 


The output is: 
60 Graph Properties 
ER 
sor - 
Graph 1 € 
ol [E eenmaal Lama] бию | 
З ee 
юк " 
о Title: 
8 E 
æf} 
o} Graphtype: [Bar 
0 
05 1 Д 1 3 х "тоў Polar 


From the Graph Properties dialog box Graph — Options tab, you can change the 
coordinates of the charts. 
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Example 4 
Changing the Gradient оў Surface Plots 


Here is a surface plot obtained by using the equation z — xb y. 


The input is : 
FPLOT Z = X^2 + Y^2 


Editing Graphs 
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The output is: 


From the Graph Properties dialog box Element =>Surface tab, set Type as Colored 
solid and select a Gradient option to change the gradient of the surface plot. 
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Example 5 
Changing Gradient of Wireframe 


Here is a surface plot obtained by using the equation z = x2-- y? and using the Colored 
| solid option 


The input is : 
FPLOT z = x^2 + y**2;SURFACE = COLOR 


The output is: 


ment => Surface tab, set Wireframe as 


From the Graph Properties dialog box Ele 
frame with no fill for the graph. 


Gradient to change the gradient of the wire’ 
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DE 


Graph Properties 


You can set Wireframe as None to remove the display of the wireframe. You can select 
any other Wireframe option to get that color for wireframes. 
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Example 6 
Applying Different Colors to Boundary and Inside of a Symbol 


Here is a scatterplot of PETALLEN by SEPALLEN from the /RIS data. 


The input is : 


USE IRIS 
PLOT PETALLEN*SEPALLEN / FILL-1 SIZE-3 COLOR-4 
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The output is: 


PETALLEN 


From the Graph Properties dialog box Element — Symbol tab, you can set the 
Boundary Color as black, Width as 2 to apply a thick black boundary for the symbols. 


PETALLEN 
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Example 7 
Drawing Anchor Bars 


Here is a bar chart displaying the education level of people from the SURVEY? data. 
The input is : 


USE SURVEY2 

RECODE EDUCATNS = EDUCATN/ 1,2-'HS dropout' 
4='Some college', 
5='College grad', 


6,7='Degree +' 
ORDER EDUCATN$ / SORT- 'HS dropout', 'HS grad', 'Some, 


college', 'College grad', 'Degree +' 
BAR EDUCATNS / XLAB-'EDUCATN' 


-'HS grad', 


The output is: 


Graph Properties 


Count 


Percentage chart. 


Use median instead of mean For height 


V, V. uf 
РА Я "e. £ 


EDUCATN 


From the Graph Properties dialog box Element => Options tab, you can draw anchor 
bars to the count. 
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Example 8 
Slicing a Pie Chart 


Here is a pie chart obtained by labeling the variable EDUCATN from the SURVEY2 
data. 


The input is: 


USE SURVEY2 
RECODE EDUCATN$ = EDUCATN/1,2-'HS dropout' 3='HS grad', 


4-'Some college' 5-'College grad', 
6,7-'Degree +' 
ORDER EDUCATNS$ / SORT- 'HS dropout', 'HS grad', 'Some college', 
'College grad', 'Degree +' 


PIE EDUCATNS 
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The output is: 


Some college 


From the Graph Properties dialog box Element — Options tab, you can separate а slice 
from the pie. 


tagad HS dropout 


College grad 
Some college 


Uncheck the Separate slice from pie option again to reset the slice which has been 


separated. 
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Example 9 
Changing the Appearance of Points in a Scatterplot 


Here is a scatterplot of SEPALLEN by PETALLEN from the /RIS data. 
The input is: 


USE IRIS 
PLOT SEPALLEN * PETALLEN 


The output is: 
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Open the Graph Properties dialog box on placing the mouse pointer on one of the 


points , you can change the symbol type, size, and fill options from the 
Element =>Symbol tab. 
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Dx 7 
PETALLEN 


In order for the changes to affect all the points, you have to check Apply all option. 


SEPALLEN 


5 6 7 


ЕЕ! 
PETALLEN 


Example 10 
Changing the Appearance of Legend 


Here is an overlaid scatterplot of INCOME, grouped by ED UCATN from the SURVEY2 
data. 
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The input is: 


USE SURVEY2 
RECODE EDUCATN$ = EDUCATN/1,2-'HS dropout' 3-'HS grad', 
4-'Some college' 5-'College grad', 
6,7-'Degree +' 
ORDER EDUCATN$ / SORT- 'HS dropout', 'HS grad', 'Some college', 
'College grad', 'Degree +' 3 
PLOT INCOME / GROUP = EDUCATNS OVERLAY SIZE -1.5 


The output is: 
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Using the Graph Properties dialog box, Legend =>Options, you can now change the 
legend title and labels as follows: 


Title:| COLLEGE HIGH SCHOOL 


m 


| Change to: SOME COLLEGE 


You can now change the appearance of legend items into matrix format as follows: 
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isplay legend 
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Example 11 
Editing of Axis Labels 


Here is a scatterplot of SEPALLEN by PETALLEN from the IRIS data. 
The input is: 


USE IRIS 
PLOT SEPALLEN * PETALLEN 
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The output is: 
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Using the Graph Properties dialog box Axis =>Display, you can edit axis labels as 


follows: 
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Graph Gallery 


Alexander Khalileev 


The Graph Gallery consists of a subset of graphs available in SYSTAT. This graphical 
"library" generates predefined graphs adapted to fit your data; so instead of specifying 
a graph type, you simply select the image of the sort of graph you wish to create. 
SYSTAT then produces the graph by prompting for the information needed to create 
it. Each gallery entry corresponds to a graphical template that gets applied to the data. 

The gallery can be customized to suit your needs. You can remove existing gallery 
items or edit them to prompt for other graphical features, such as axis labels. You can 
also create your own items. Grouping gallery items in separate folders yields multiple 
specialized galleries, such as a bar chart gallery or a gallery of three-dimensional 
graphs. Furthermore, you can create graph gallery entries that run statistical analyses 
or transform variables and files. 
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Graph Gallery Dialog 
To open the Graph Gallery, from the menus choose: 


Graph 
Graph Gallery... 


Graph Gallery 


The Graph Gallery creates graphs using a predefined command syntax in conjunction 
with user-supplied graph specifications. An icon appears for each graph contained in 
the gallery. Use the scrollbar at the bottom of the gallery to scroll through the icons. 
Select a graph to create by clicking its icon. The name of the currently selected gallery 
item appears at the bottom of the Graph Gallery. Click OK or double-click an icon to 
begin creating the selected graph. 

The gallery organizes icons in two rows, using as many columns as are needed to 

display all icons. Icons are arranged alphabetically by item name in a row-major 
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order. In other words, SYSTAT fills all the columns in the first row before 
continuing to the next row. For example, in a gallery containing five items, the icons 
appear as follows: 


Item 1 Item 2 Item 3 
Item 4 Item 5 
Files for Gallery Items 


SYSTAT creates Graph Gallery entries based on the files contained in the folder 
assigned to the graph gallery. Each gallery item consists of two files: a command file 
and a bitmap file. 

m Command file. Contains the underlying SYSTAT commands needed to create the 
desired graph. Usually this file employs token substitution to prompt for variables, 
files, numbers, or strings. 

m Bitmap file. Represents the output produced by the associated command file. The 
bitmap appears as the icon in the gallery. 


The command and the bitmap files must have the same name, but use different 
extensions. Command files must use the SYC extension and bitmap files must use 
BMP. 

When you open the Graph Gallery, SYSTAT scans the gallery folder and generates 
a gallery icon for each SYC-BMP pair. The name of the command file (without the 
SYC extension) is assigned as the name of the gallery item and appears in the Graph 
Gallery dialog when you select an icon. All command files without a matching bitmap 
receive a default icon in the gallery. 

Selecting a gallery icon and clicking OK submits the corresponding command file. 
Because most items use tokens to tailor a graph to your specifications, the Graph 
Gallery activates token processing automatically. 


Contents 


Initially, the Graph Gallery contains several items designed to simplify graph creation. 


Each item begins with a brief description of the graph created. You can cancel 
execution of the item after reading the description. The following items appear in the 


default gallery: 
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3-D Line of Counts. Creates a three-dimensional line graph (also called a ribbon plot). 
The line represents the number of cases appearing in each combination of two 
categorical variables. 


3-D Pyramid of Means. Creates a three-dimensional pyramid chart. The pyramids have 
heights equal to the average value ofa continuous variable for each combination of two 
categorical variables. 


Anchor Bar Chart. Creates a two-dimensional anchored bar chart. An anchored bar 
chart uses a base value with bars extending above or below the base value. 


ANOVA Interaction Overlay Plot. Creates a single plot of the average dependent 
variable value at each level of one categorical independent variable for each level of 
another categorical independent variable. Use this plot to illustrate interaction effects 
in two-way ANOVA models. After creating the graph, the data file in use contains the 
original data as well as ANOVA diagnostic statistics. 


ANOVA Interaction Plot. Creates plots corresponding to the two-way ANOVA 
interaction Quick Graphs. You must identify the continuous response variable and two 
categorical independent variables. Upon completion, the data file in use contains the 
original data and ANOVA diagnostic statistics. 


ANOVA Main Effect Plot. Creates a main effect plot corresponding to the ANOVA 
main effects Quick Graph. You must identify a continuous dependent variable and a 
categorical independent variable. Upon completion, the data file in use contains the 
original data and ANOVA diagnostic statistics. 


Bivariate Kernel and Normal density with Tile. It creates a Bivariate Kernel and 
Normal density graph with TILE on the Z-axis. 


Border Displays (3D). Plots two continuous variables against each other. The 
distributions of each variable appear at the borders of the plot. 


Bubble Plot. Creates a two-dimensional scatterplot in which the size of the plotting 
symbol represents the values of a third variable. 
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Cluster Bar. Produces a bar chart showing counts in each level ofa categorical variable 
for subpopulations. All subpopulations appear as bars displayed at every categorical 
variable level. 


Count as Percent of АЙ Cases. Calculates the count for each level of a categorical 
variable and determines the number of cases in the current file. To create the chart, this 
item closes the current file and uses a file of summary statistics to create the bar graph 
of percents. 


Count as Percent of Valid Cases. Calculates the count for each level of a categorical 
variable and determines the number of valid cases in the current file. To create the 
chart, this item closes the current file and uses a file of summary statistics to create the 
bar graph of percents. 


Discriminant Analysis Canonical Means Plot. The plot created here is similar to the 
Discriminant Analysis Quick Graph. However, instead of plotting all of the 
observations, we plot the group means only. Two temporary files containing statistics 
necessary to create the plot are created but are subsequently deleted. Upon completion, 
no data file is open. 


Discriminant Analysis Misclassification Plot. The plot created here is similar to the 
Discriminant Analysis Quick Graph. However, instead of plotting all of the 
observations, only the misclassified cases appear. The mean of each group is placed at 
the center of the corresponding ellipse to aid in identifying the group. Two temporary 
files containing statistics necessary to create the plot are created but are subsequently 
deleted. Upon completion, no data file is open. 


Dot Histogram with Kernel Smoother. Shows the sample density of a continuous 
variable using stacks of dots and a kernel curve. The height of each stack represents the 


number of cases that fall within a given interval. 


Dot Histogram with Normal Smoother. Shows the sample density of a continuous 
variable using stacks of dots and a normal curve. The height of each stack represents 


the number of cases that fall within a given interval. 


Dual Area. Displays profile charts of means for two subpopulations in a single frame. 
One subpopulation uses the upper portion of the graph and the other uses the lower 


portion. 
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Dual Histogram. Displays histograms representing the densities for two 
subpopulations back-to-back in a single frame. 


Eclipse. It generates graphs of solar eclipses through 2005 using the 
ECLIPSESELECT.SYD file. 


Global Map with Contour. It creates a global map with a contour plot overlay. It 
requires a map file. You will be prompted to choose a variable and tick value, as well 
as the longitude and latitude rotation of the globe. 


Globe. Creates a globe representing Earth. Use bigworld.syd as the data file. 


Histogram with Kernel Smoother. Shows the sample density of a continuous variable 
using a series of vertical bars and a kernel curve. The height of each bar represents the 
number of cases that fall within a given interval. 


Histogram with Normal Smoother. Shows the sample density of a continuous variable 
using a series of vertical bars and a normal curve. The height of each bar represents the 
number of cases that fall within a given interval. 


Horizontal Bars. Creates a horizontal bar chart displaying the average for a continuous 
variable at each level of a categorical variable. 


Kernel and Normal Contours. Creates two plots representing the density of two 
continuous variables as contours. One plot uses bivariate normal contours and the other 
uses nonparametric kernel contours. 


Kernel and Normal Mosaics. Produces a plot representing the density of two 
continuous variables as shaded contours. 


Map Projection. Creates a map using a specified projection method, allowing for 
control over scale and axis display. 


Map with Labels. Produces a labeled map. The data file must contain variables named 
lablon and lablat. 


Mosaic Plot. Uses shades of color to display the number of cases in each combination 
of two categorical variables. 
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Multiplot. Creates a multiplot, a plot similar to a Trellis display, in which two 
continuous variables are plotted at each level of one or two categorical variables. 


Multiplot with Contour. Creates a table of contour plots corresponding to each level 
of one or two categorical variables. 


Multiplot with Tile. Creates a table of mosaic plots corresponding to each level of one 
or two categorical variables. 


Outliers and Influence (Linear Regression). This item is a diagnostic plot for simple 
linear regression. The dependent variable is plotted against the independent variable 
with the least squares regression line added. Cases with studentized residuals greater 
than [2] are labeled as outliers. Cases with Cook's distances exceeding .5 are labeled as 
influential. 


Partial Correlations. It estimates a regression model, saves partial residuals, and uses 
the partial residuals to compute the partial correlations of the independent variables. 
Three temporary files containing the necessary statistics are created but are 
subsequently deleted. Upon completion, no data file is open. 


Permuted Data Matrix. Uses hierarchical clustering to create a permuted data matrix 
plot. Colors in the plot represent the size of the values in the matrix. Dendrograms 
representing the clustering of variables and cases appear along the edges ofthe matrix. 


Polar Area. Creates a polar area graph, plotting two continuous variables against a 
categorical variable in polar coordinates. 


Scatterplot with 2 Y-Variables of Different Ranges. Plots two continuous variables 
against a third, using a legend to identify the symbols for each y-variable. 


Spearman Correlation Probabilities. It computes a matrix of Spearman correlations 
and their associated probability values. It ranks the raw data values of variables for the 
Spearman correlation, replacing the values in the working data file with their ranks. Be 


careful not to save the ranked data over your original data file. 


Stacked Bar. Creates a stacked bar chart. The chart displays a single bar for each level 
of a categorical variable. Each bar represents multiple variables, with subdivisions 


representing the average for each variable. 
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Stacked Bar Percentage. Creates a stacked bar chart, in which the bar segments 
represent percentages. The chart displays a single bar for each level of a categorical 
variable and each bar represents multiple variables. SYSTAT sums the means within 
each group and forms each bar proportionately to this sum. 


Triangle Plot. Plots three variables in a triangular coordinate system. This plot is often 
used to display mixtures composed of three components. 


US Map with Plot. Superimposes circles on a map. The size of the circles reflects the 
values of a variable. The data file must contain variables named Jablon and lablat. 


Voronoi Tessellation. Plots a Voronoi tessellation on a map. The data file must contain 
variables named /ablon and lablat. You can specify the color and line style for the 
boundary lines, as well as the projection used. 


Customization of the Graph Gallery 


Creating customized galleries allows you to create an environment conducive to your 
most commonly performed tasks and analyses. You can: 


ш Group related items in specialized galleries 
m Add items to the gallery 

m Modify existing items in the gallery 

ш Delete items from the gallery 


Graph Gallery Folder 


By default, SYSTAT uses the files contained in the ‘Systat 12 Gallery’ folder to 
construct the Graph Gallery. You can change the folder used to create the gallery using 
the File Locations tab of the Global Options dialog. 

Although no upper limit on the number of items in the Graph Gallery exists, locating 
a particular entry becomes more difficult as items get added. In addition, due to the 
alphabetical ordering of the items, grouping similar graphs together requires that their 
names begin with the same letter. 

To impose a more defined structure on the Graph Gallery, create subfolders in the 
“Systat 12\ Gallery’ folder. Place similar items in each subfolder, resulting in a 
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collection of graph galleries. For example, create a subfolder named 'ANOVA 
Graphics’ for all gallery items that create graphics commonly used in the analysis of 
variance. When you need these items, change the gallery folder. 


Creating Gallery Items 
То generate an item in the Graph Gallery: 


+ Create the underlying command file. If you are unfamiliar with SYSTAT's command 
language, use the menus and dialogs to create the desired output and save the Log tab 
of the command pane to a command file. 


+ Inthe command file, replace aspects that should vary from graph to graph with tokens. 
Common graph characteristics to replace include variables, file names, numbers, and 


strings. 
+ Save the command file to the gallery folder. 


+ Save a bitmap to the gallery folder under the same file name used for the command file. 


Icon Creation and the Graph Gallery 


Any graphic created in SYSTAT can be saved as a bitmap. If the bitmap is saved to the 
Gallery folder and a command file having the same name exists in that folder, the 
bitmap image appears as an icon in the Graph Gallery dialog. However, the area 
displaying the gallery contents has a limited viewable size. To prevent an oversized 
bitmap from dominating the gallery, SYSTAT scales all gallery images to fit within a 
defined region. This scaling can result in distorted images. 

To lessen distortion effects, use the bounding rectangle to define a square region 
around the graphic before saving it. (Note: You must be in Page View ( al )in the 
Graph editor to adjust the bounding rectangle.) 


498 


Chapter 11 


305— = —“ n sl 


4 lu | 


10 20 30 40 50 60 
INCOME 


Reducing the amount of required image scaling can also improve the appearance of the image in the 
gallery. When saving the bitmap, set the file type to Bitmap (*.bmp) and select Save Options 
from the Save button to open Graph Save Options dialog box.. 


Graph Save Options 


ObBilevel © 16 color O 256 color © 24-bit color 


ald Coc J 770A. 


The BMP Output Filter Setup dialog allows you to specify optional settings to apply 
when saving bitmaps. Select the desired options and click OK. 
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Custom Gallery Icons 


SYSTAT constructs the icons appearing in the Graph Gallery from the bitmaps in the 
assigned gallery folder. However, these bitmaps do not need to be created by SYSTAT; 
any software that saves bitmaps can be used. For example, if you have command file 
that creates a mobile for a regression tree, you could use a bitmap ofa tree in the gallery 
instead of an image of the mobile. 

When using bitmaps created using other software, keep in mind that the Graph 
Gallery scales images to fit in the dialog. Large bitmaps lose detail and non-square 
bitmaps are distorted when displayed in the Graph Gallery. Small, square images yield 
the best results. 


Modifying Gallery Items 


Item modification involves editing an existing command file or changing an existing 
bitmap. To modify an existing item: 


Open the corresponding command file. 


Edit the commands as needed. Replacing characteristics with tokens allows more 
control over the final output. Replacing tokens with values (variables, file names, 


numbers, or strings) provides more uniformity from graph to graph. 


Save the edited file. 
Editing of bitmaps must be done using software designed for graphic manipulation. 


Deleting Gallery Items 


* 


To delete an item: 
Remove the corresponding command file from the gallery folder. 
Bitmaps without command files serve no purpose and should also be removed. 


If you remove every command file from the assigned gallery folder, the Graph Gallery 


appears empty. Deleting the assigned gallery folder itself disables the Graph Gallery 


until you assign a new gallery folder. 
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Generalizing the Graph Gallery 


A single Graph Gallery item can produce as many graphs as desired or no graphs at all. 
You can extend the functionality of the gallery by placing command files that perform 
other tasks in the gallery folder. Any analysis, transformation, or file manipulation 
available using SYSTAT's command language can be used as a Graph Gallery item. 
For example, the following commands apply the log transformation to a set of 
variables. 


The input is: 


TOKEN &varlist / TYPE-NMULTIVAR, 
PROMPT-'Select the variables to transform to logarithms.' 
LET (&varlist) - L10(8) 


Saving these commands to the file ‘Log Т) ransformation.syc' in the gallery folder 
results in a Graph Gallery item called Log Transformation. The gallery displays this 
item using a default icon. Double-clicking the icon prompts the user for variables to be 
transformed and replaces that data with log transformed values. 


Acronym & Abbreviation 


Expansions 


A 

ABS - absolute value 

ACF - autocorrelation function 

ACOLOR - color axes 

ACS - arccosine 

ACT - actuarial life table 

AD test - Anderson Darling test 

ADDTREE - additive trees 

ADFG - asymptotically distribution free estimate 
biased, Gramian 

ADFU - asymptotically distribution free estimate 
unbiased 

ADJSEASON - seasonal adjustment 

AHMAX - maximum extent 

AHMIN - minimum extent 

AIC - Akaike information criterion 

AID - automatic interaction detection 

ALT - alternative 

ANCOVA - analysis of covariance 

ANGI - deviation of angles from north in a 
clockwise direction 

ANG? - deviation of angles from horizontal (for 
3D models) 

ANG3 - tilt angle 

ANOVA - analysis of variance 

ANOVAHYPO - hypothesis tests in analysis of 
variance 

AR - autoregressive 

ARIMA - autoregressive integrated moving 
average 

ARL - average run length 
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ARMA - autoregressive moving average 
ARS - adaptive rejection sampling 
ASCII - American Standard Code for 
Information Interchange 

ASE - asymptotic standard error 

ASN - arcsine 

ATH - arc hyperbolic tangent 

ATN - arctangent 

AVERT - vertical extent 

АУС - average 


B 

BC - Bray-Curtis similarity measure 
BCa - Bias Corrected and accelerated 
BCF - Beta cumulative function 
BDF - Beta density function 
BETACORR - beta correction 

BIC - Bayesian information criterion 
BIF - Beta inverse function 

BMP - Windows bitmap 

BOF - beginning-of-file 

BOG - beginning-of-BY group 
ВОМЕ - Bonferroni 

BOOT - bootstrap 

BRN - Beta random number 


12; 

CART - classification and regression trees 
CBSTAT - column basic statistics 

CCF - Cauchy cumulative function 

CCF - cross-correlation function 

CDF - Cauchy density function 

cdf/CF - cumulative distribution function 
CDFUNC - coefficients for canonical variables 
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CFUNC - coefficients for the classification 
functions 
CGM - Computer graphics metafile: binary or 
clear text 
CHAZ - cumulative hazard 
CHISQ - Chi-square distribution 
CHOL - Cholesky decomposition 
CI - confidence interval 
CIF - Cauchy inverse function 
CIM - confidence interval of mean 
CLASS - classification 
CLSTEM - stem and leaf plot for column 
CMeans - canonical scores of group means 
CMULTIVAR - multiple string variables 
COEF - coefficients 
COL/col - column 
COLPCT - Column percentages 
CONFIG - configuration 
CONT - Contingency coefficient 
CONV - convergence 
CORAN - correspondence analysis 
CORR - correlations 
CORRI - single correlation coefficient 
CORR? - equality of two correlations 
COV - covariance 
Cp - process capability index 
CPL - process capability based on lower 
specification limit 
CPU - process capability based on upper 
specification limit 
Cpk-Process capability index for off-centered 
process 
CR - confidence region 
CRA - cost of response above UTL 
CRB - cost of response below LTL 
CRN - Cauchy random number 
CSCORE - canonical scores 
CSIZE - size of characters 
CSQ - Chi-square 
CSTATISTICS - column statistics 
CSV - comma separated values 


CUSUM - cumulative sum 

CUSUM HI - Upper cumulative sum 
CUSUM LO - Lower cumulative sum 
CV - coefficient of variation 

CVI - cross validation index 


D 

DBF - Dbase files 

DC - deciles of risk 

DECF - Double exponential cumulative function 
DEDF - Double exponential density function 
DEIF - Double exponential inverse function 
DENFUN - density function 

dep. - dependent 

DERN - Double exponential random number 
DET - determinant 

DEVI - deviates (observed values - expected 
values) 

DEXP - Double exponential distribution 

df - degrees of freedom 

DF - distribution function 

DHAT - estimated distance 

DIF - data interchange format 

DIM - dimension 

DISCRIM - discriminant analysis 

DIST - distance 

DIT - dot histogram 

DOE - design of experiments 

DOS - disc operating system 

DPMO - defects per million opportunities 
DPU - defects per unit 

DTA - Stata files 

DUCF - Discrete uniform cumulative function 
DUDF - Discrete uniform density function 
DUIF - Discrete uniform inverse function 
DUNIFORM - Discrete uniform 

DURN - Discrete uniform random number 
DWLS - distance weighted least-squares 


E 
ECF - Exponential cumulative function 
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ЕРЕ - Exponential density function 
EEXP - extreme value exponential 
EIF - Exponential inverse function 
EIGEN - eigenvalues 

ELAMBDA - exp(lambda) 

EM - expectation-maximization 

EMF - Windows enhanced metafile 
ENCF - Logit normal cumulative function 
ENDF - Logit normal density function 
ENIF - Logit normal inverse function 
ENORMAL - Logit normal 

ENRN - Logit normal random number 
EOF - end-of-file 

EOG - end-of-BY group 

EPS - Encapsulated postscript 

ERN - Exponential random number 
ES - exhaustive search 

ESS - error sum of squares 

EW - extreme value Weibull 

EWMA - exponentially weighted moving average 
EXP/exp - exponential/ expected 


F 

FAR - false-alarm rates 

FCF - F cumulative function 
FCOLOR - color foreground 

FDF - F density function 

FIF - F inverse function 

FINV - inverse of the F cumulative 
FITC - fitting distribution: continuous 
FITD - fitting distribution: discrete 
FITDIST - fitting distributions 
Flexibeta - flexible beta 

FPLOT - function plots 

FRN - F random number 

FTD - folded trellis detector 
FTDEV - Freeman-Tukey deviate 
FULLCOND - full conditional 
FUN - function 


G 


Acronyms 


GCF - Gamma cumulative function 
GCOR - groupwise correlation matrix 
ССОУ - groupwise covariance matrix 
GCV - generalized cross validation 
GDF - Gamma density function 

GECF - Geometric cumulative function 
GEDF - Geometric density function 
GEIF - Geometric inverse function 
GEN - general Toeplitz structure 
GERN - Geometric random number 
GG - Greenhouse Geisser 

GIF - Gamma inverse function 

GIF - Graphics Interchange Format 
GLM - generalized linear models 
GLMHYPO - hypothesis tests in general linear 
model 

GLMPOST - post hoc estimate for repeated 
measures in general linear model 

GLS - generalized least-squares 

GMA - geometric moving average 

GN - Gauss-Newton method 

GOCF - Gompertz cumulative function 
GODF - Gompertz density function 
GOIF - Gompertz inverse function 
GORN - Gompertz random number 
GRN - Gamma random number 

GUCF - Gumbell cumulative function 
GUDF - Gumbell density function 
GUIF - Gumbell inverse function 
GURN - Gumbell random number 


H 

Н & L - Hosmer and Lemeshow 

HC - heteroscedasticity-consistent 

HCF - Hypergeometric cumulative function 
HDF - Hypergeometric density function 
HF- Huynh-Feldt 

HGEOMETRIC - hypergeometric 

HIF - Hypergeometric inverse function 
HIST - histogram 

HKB - Hoerl, Kennard, and Baldwin 
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H-L trace - Holding-Lawley trace 

HR - hit-rates 

HRN - Hypergeometric random number 
HSD - honestly significant differences 
HTERM - terms tested hierarchically 
HTML - hyper text markup language 
HYMH - hybrid Metropolis-Hastings 


I 

IF - Inverse cumulative distribution function 
IGAUSSIAN - inverse Gaussian 

IGCF - Inverse Gaussian cumulative function 
IGDF - Inverse Gaussian density function 
IGIF - Inverse Gaussian inverse function 
IGRN - Inverse Gaussian random number 
IIDMC - independently and identically 
distributed Monte Carlo 

IMPSAMPI - importance sampling integration 
IMPSAMPR - importance sampling ratio 
I-MR - individual and moving range 
Ind/indep - independent 

IndMH - Independent Metropolis-Hastings 
INDSCAL - individual differences scaling 
INITSAMP - initial sample 

INTEG FUN - integrated function 

IPA - iterated principal axis 

ITER - iterations 


J 

JACK - jackknife 

JCLASS - jackknifed classification 

JMP - JMP v3.2 data files 

JPEG/JPG - joint photographic experts group 


K 

K-M - Kaplan-Meier 

KNBD - kth nearest neighborhood 

KRON - Kronecker product 

K-S test - Kolmogorov-Smirnov test 

KSI - one sample Kolmogorov-Smirnov tests 
KS2 - two sample Kolmogorov-Smirnov tests 


L 

LAD - least absolute deviations 

LB - larger the better 

LCF - Logistic cumulative function 
LCHAZ - log cumulative hazard 

LCL - lower control limit 

LCONV - log-likelihood convergence criteria 
LDF - Logistic density function 

LGM - log gamma 

LGST - logistic 

LIF - Logistic inverse function 

L-LILL - log likelihood 

LMS- least median of squares 
LMSREG - least median of squares regression 
LNCF - Lognormal cumulative function 
LNDF - Lognormal density function 
LNIF - Lognormal inverse function 
LNOR/LNORMAL - lognormal 

LNRN - Lognormal random number 
loc - location 

LOGI - one-parameter logistic (Rasch) 
LOG2 - two-parameter logistic 

LOGIT - logistic regression 
LOGITHYPO - hypothesis tests in logistic 
regression 

LOGLIN - loglinear modeling 

LR - likelihood ratio 

LRCHI - likelihood ratio chi-square 
LRDEV - likelihood ratio of deviate 
LRN - Logistic random number 

LS - least-squares 

LSD - least significant difference 

LSL - lower specification limit 

LSQ - least-squares 

LTAB - life tables 

LTL - lower tolerance limit 

LW - Lawless and Wang 


M 
MA - moving average 


505 


MAD - mean absolute deviation 

MAHAL - Mahalanobis distances 

MANCOVA - multivariate analysis of covariance 
MANOVA - multivariate analysis of variance 
MANOV AHYPO - hypothesis tests in 
MANOVA 

MANOV APOST - post hoc estimate for repeated 
measures in MANOVA 

MAR - missing at random 

MAX - maximum 

MAXSTEP - maximum number of steps 

MCAR - missing completely at random 

MCMC - Markov Chain Monte Carlo 

MDPREF - multidimensional preference 

MDS - multidimensional scaling 

MIN - minimum 

M-H- Metropolis-Hastings 

MIS - number of missing values 

MIX - mixed regression 

MIXHIER - mixed regression for data having a 
hierarchical structure 

MIXMULTY - mixed regression for data having 
à multivariate structure 

ML - Maximum Likelihood 

MLA - maximum likelihood analysis 

MLE - maximum likelihood estimate 

MML - maximum marginal likelihood 

MRC - Multiple Regression and Correlation 

MS - mean squares 

MSE - mean square error 

MSIGMA - sigma measurement 

MT - Mersenne-Twister 

MTW - MINITAB v11 data files 

MU2 - Guttman's mu2 monotonicity coefficients 
MULTIVAR - multiple variables 

MW - minimum within sum of squares deviations 
MWL - maximum Wishart likelihood 


N 
NAR - non-stationary first-order autoregressive 
NB - nominal the best 


Acronyms 


NBB - nominal-the-best: bilateral tolerance 
NBCF - Negative binomial cumulative function 
NBD - number of active bounds on parameter 
values 

NBDF - Negative binomial density function 
NBIF - Negative binomial inverse function 
NBINOMIAL - Negative binomial 

NBRN - Negative binomial random number 
NBU - nominal-the-best: unilateral tolerance 
NCAT - number of categories 

NCF - Binomial cumulative function 

NCOL - number of columns 

NDF - Binomial density function 

NDMAX - maximum number of points 
NDMIN - minimum number of points 

NEM - number of EM iterations 

NEXPO - negative exponential 

NIF - Binomial inverse function 

NIPALS - Nonlinear iterative partial least Squares 
NLAG - number of lags 

NLLOSS - nonlinear loss functions 
NLMODEL - nonlinear models 

NMIN - minimum count 

NMULTIVAR - multiple numeric variables 
NONLIN - nonlinear models 

NP-Number nonconforming 

NPAR - nonparametric 

NREC - non-recreationist 

NRN - Binomial random number 

NROW - number of rows 

NRP - number of apparently redundant 
parameters 

NSAMP - number of sub-samples 

NSPLIT - maximum number of splits 

NX - number of nodes along the x axis 
NXDIS - number of discretization points in the x 
(North) direction 

NY - number of nodes along the y axis 
NYDIS - number of discretization points in the y 
(East) direction 

NZ - number of nodes along the z axis 


506 


Acronyms 


NZDIS - number of discretization points in the z 
(Depth) direction 


[9] 

Obs-observed 

OBSFREQ - observed frequency 

OC - operating characteristic 

ODBC - open database capture and connectivity 
OFREQ - outlier frequencies 

OLS - ordinary least-squares 
ORTHEQ-Equally Spaced Orthogonal 
component 

ORTHUN- Unequally Spaced Orthogonal 
component 


Р 
P - Proportion nonconforming 
PACF - Pareto cumulative function 
PACF - partial autocorrelation function 
PADF - Pareto density function 
PAIF - Pareto inverse function 
PARAM - parameters 
PARN - Pareto random number 
PCA - process capability analysis 
PCF - iterated principal axis factoring 
PCF - Poisson cumulative function 
PCNTCHANGE - percentage change 
PCT - Macintosh PICT 
PDF - Poisson density function 
pdf - probability density function 
PDL - polynomial distributed lag 
PERMAP - perceptual mapping 
PIF - Poisson inverse function 
PLIMITS - probability limits 
PLS - partial least squres 
pmf - probability mass function 
PMIN - minimum proportion 
PNG - Portable Network Graphics 
POLY - polygon 
POSAC - partially ordered scalogram analysis 
with coordinates 


P-P - probability plot 

PP - process performance 

Ppk - Process performance index for off-centered 
process 

PPL - process performance based on lower 
specification limit 

PPM - parts per million 

PPU - process performance based on upper 
specification limit 

PRE - percentage reduction error 
PREFMAP - preference mapping 

PRN - Poisson random number 

PROB - probability 

PROPI - single proportion 

PROP2 - equality of two proportions 

PS - PostScript 

PVAF/p.v.a.f. -- present value annuity factor 
p-value - probability value 


Q 

QC - quality control 

QMLE - quasi maximum likelihood estimate 
QNTL - quantiles 

QPLOT - quantile plots 

Q-QPLOT - two sample quantile plot 
QRD - QR decomposition 

QS - quick search 

QSK - quantitative symmetric similarity 
coefficients (or Kulczynski measure) 
QUASI - Quasi-Newton method 


R 

К & К - repeatability and reproducibility 

R chart - range chart 

RADMAX - maximum horizontal direction for 
the search radius 

RADMIN - minimum horizontal direction for the 
search radius 

RAND - random 

RANDSAMP - random sampling 

RANKREG - rank regression 
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RBSTAT - row basic statistics 

RCF - Rayleigh cumulative function 

RDF - Rayleigh density function 
RDISCRIM - robust discriminant 

RDIST - robust distance 

RDVER - vertical direction for the search radius 
REPAR - reparametrize 

REPS - replicates 

RESID - residuals 

RIF - Rayleigh inverse function 

RJS - rejection sampling 

RMS - root mean square 

RMSEA - root mean square error of 
approximation 

RMSSTD - root mean square standard deviation 
ROC - receiver operating characteristic 
ROWPCT - Row percentages 

RRN - Rayleigh random number 

RS - response surface 

RSE- robust standard errors 

RSEED - random seed 

RSM- response surface methods 

RSQ - stress and squared correlation 

RSS - residual sum of squares 
RSTATISTICS - row statistics 

RTF - rich text format 

RWM-H - random walk Metropolis-Hastings 
RWSTEM - stem and leaf plot for rows 


S 

S chart - standard deviation control chart 

SANGI - angle (in degrees) of the first minor axis 
of the search ellipsoid 

SANG? - angle (in degrees) of the major axis of 
the search ellipsoid 

SANG3 - angle (in degrees) of the second minor 
axis of the search ellipsoid 

SAV - SPSS files 

SB - smaller the better 

sc - scale 

SC - set correlation 


Acronyms 


SCDFUNC - standardized coefficients for 
canonical variables 

SCF - Studentized cumulative function 
SD - standard deviations 

sd2/sas7bdat - SAS v9 files 

SDF - Studentized density function 
SE/se/S.E. - standard error 

SEK - standard error of kurtosis 

SEM - standard error of mean 

SES - standard error of skewness 

shp - shape 

SIF - Studentized inverse function 
SIMPLS - Straight-forward Implementation of 
Partial Least Squares 

SKMEAN - simple kriging mean 

SL - specification limit 

SMIN - minimum split value 

SPLOM - scatter plot matrix 

SQL - structured query language 
SQRT/SQR - square-root 

SRN - Studentized random number 
SRWR - sum of rank weighted residuals 
SS - sum of squares 

SSCP - sum of squares and cross products 
STA - Statistica v5 data files 

STAND - standardized deviates 

SVD - singular value decomposition 
SW - Shapiro-Wilks 

SYC/CMD - SYSTAT command Files 
SYZ/SYD/SYS - SYSTAT data files 
SYO - SYSTAT output files 


T 

ТІ - one-sample t-test 

T2 - two-sample t-test 

TANALYZE - Taguchi design: analyze 
TCF - t cumulative function 

TCOR - total correlation 

TCOV - total covariance 

TDF -t density function 

TESTAT - Test Item Analysis 
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Acronyms 


TESTATCL - classical test item analysis 
TESTATLOG - logistic item response analysis 
TETRA - tetrachoric correlations 
TGENERATE - Taguchi design: generate 

TIF - t inverse function 

TIFF - Tagged Image File Format 

TLOG - log time 

TLOSS - Taguchi's Loss Function 

TNH - hyperbolic tangent 

TOHCO - Hypothesis Testing: Zero correlation 
TOHCI - Hypothesis Testing: Specific 
correlation 

TOHC2 - Hypothesis Testing: Equality of two 
correlation coefficients 

TOHPI - Hypothesis Testing: Single proportion 
TOHP2 - Hypothesis Testing: Equality of two 
proportions 

TOHT! - Hypothesis Testing: One sample t-test 
TOHT2 - Hypothesis Testing: Two sample t-test 
TOHTPAIRED - Hypothesis Testing: Paired t- 
test 

ТОНУ1 - Hypothesis Testing: Single variance 
ТОНУ? - Hypothesis Testing: Two variances 
TOHVN - Hypothesis Testing: Several variances 

TOHZI - Hypothesis Testing: One sample z-test 
TOHZ2 - Hypothesis Testing: Two sample z-test 
TOL - tolerance 

TPLOT - time series plot 

TPREDICT - Taguchi design: predict 

TRCF - Triangular cumulative function 

TRDF - Triangular density function 

TRI - triangular 

TRIF - Triangular inverse function 

TRIM - trimmed mean 

TRN - trandom number 

TRP - transpose 

TRRN - Triangular random number 
TSFOURIER - Fourier decomposition of time 
series 
TSIV - Two-Stage Instrumental Variables 
TSLS - Two-Stage Least Squares 


TSP - traveling salesman path 

TSQ chart - Hotelling's T? chart 
TSSMOOTH - smoothing time series 
TXT - text format 


U 

U chart - chart showing defects per unit 
UCF - Uniform cumulative function 
UCL - upper control limit 

UDF - Uniform density function 

UIF - Uniform inverse function 

UNCE - uncertainty coefficient 

URN - Uniform random number 

USL - upper specification limit 

UTL - upper tolerance limit 


V 
VAR - variance 
VIF - variance inflation factor 


Ww 

WB - Weibull 

WCF - Weibull cumulative function 
WCOR - pooled within-group correlation 
WCOV - pooled within-group covariance 
WDE - Weibull density function 
WHISKER - Box-and-Whisker plot 

WIF - Weibull inverse function 

WME - Windows metafile 

WRN - Weibull random number 


X 

XCF - Chi-square cumulative function 
XDF - Chi-square density function 

XIF - Chi-square inverse function 

XLAG - separation distance between lags 
XLS - excel format 

XLTOL - tolerance for lags 

ХМАХ - maximum along x axis 

XMIN - minimum along x axis 
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X-MR chart - Individuals and moving range chart 
XPT/TPT - SAS transport files 

XRN - Chi-square random number 

XTAB - Crosstabulations 


Y 
YMAX - maximum along y axis 
YMIN - minimum along y axis 


Z 

Z1 - one-sample z-test 

Z2 - two-sample z-test 

ZCF - Normal cumulative function 
ZDF - Normal density function 
ZICF - Zipf cumulative function 
ZIDF - Zipf density function 
ZIF - Normal inverse function 
ZIIF - Zipf inverse function 
ZIRN - Zipf random number 
ZMAX - maximum along z axis 
ZMIN - minimum along z axis 
ZRN - Normal random number 


Acronyms 


А 


analysis of variance, 492 

anchored bar charts, 492 

Andrews' Fourier plots, 252 
case labels, 257 
commands, 268 
compared to parallel coordinate displays, 254 
examples, 269 
overview, 249 
standardizing data, 254 

Andrews' smoothing, 189 

Arcview files, 346 

arrow plots, 264 

attention maps, 47 

axes 
examples, 388, 389, 390, 391, 392, 393 
grid lines, 367, 445 
reverse scale, 367, 442 
tick marks, 365 
transformations, 415 
transposing, 363, 365 

axis, 440 


B 


bar charts, 23, 27 
anchored, 30 
bar width, 30 
commands, 48 
converting to another chart, 426, 434 
dual displays, 28 
error bars, 359 
examples, 49, 53, 55, 58, 61, 64, 69, 71, 75, 79, 
81, 85, 90, 95, 96, 97, 101, 110, 470 
grouping variable(s), 28 
median bar charts, 30 
multiplot, 28 
multiplots, 27, 267 


Index 


overlaying, 28 
overview, 23 
percentages, 30 
pseudo 3D, 30 
repeated trials, 28 
stack, 28 
bar charts:pseudo perspective, 16 
bar width, 130 
Benford's law distribution, 158, 323 
beta distribution, 157 
bin size, 130 
binomial distribution, 157 
bisquare smoothing, 189 
bitmaps, 491, 497, 499 
bivariate scatterplots, 180 
BMP, 491, 497, 499 
border displays, 161, 181, 492 
bounding rectangle, 414, 497 
box plots, 259 
combine with symmetrical dot density, 124 
examples, 138, 139, 141, 145, 147 
far outside values, 121 
hinges, 121 
notched, 124 
options, 124 
outside values, 121 
overview, 113 
box-and-whisker plots, 121 
bubble plots, 492 


Cartesian coordinates, 361 
case labels, 257 

casement plot, 259 
Cauchy distribution, 157 
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Index 


Chernoff's faces, 264 
chi-square distribution, 157 


Cleveland median slope adjustment., 370 


cluster bar, 494 
clustered bar charts, 493 
colors, 372 


confidence ellipses, 161, 182, 415 


confidence intervals, 415 


confidence kernels, 161, 162, 182, 261, 415 


contour plots, 127, 180, 320 
convex hulls, 161, 182, 261 
coordinates 

Cartesian, 361 

polar, 361 

rectangular, 361 

spherical, 361 

triangular, 361 
cumulative frequencies, 130 


cylindrical projections, 361, 362 


D 


Delaunay triangulation, 163, 182, 261 


Delboeuf figure, 7 
Density Displays, 277 
density function plots, 127 


examples, 135, 149, 277, 278 


options, 130 
overview, 113 


density stripes, 113, 126, 127, 259 


examples, 138, 167 


discrete uniform distribution, 323 


discriminant analysis, 493 
dit plots, 113, 125, 126 
examples, 141 


dot charts, 30 
commands, 48 


converting to another chart, 426, 434 


dual displays, 32 
error bars, 359 


examples, 49, 53, 55, 58, 61, 64, 69, 71, 75, 81, 


85, 90, 99, 110, 310 
grouping variable(s), 31 


median dot charts, 33 
multiplots, 30, 32, 267 
overlaying, 32 
overview, 23 
percentage, 33 
pseudo 3D, 33 
repeated trials, 32 
dot density plots, 125 
examples, 138, 145 
jittered, 127 
overview, 113 
symmetric, 126 


dot histograms, 125, 126, 259, 493 


examples, 145 


double exponential distribution, 157 


drawing attributes, 417 
drawing tools, 416, 417 


dual displays, 27, 28, 30, 37, 41 


DWLS smoothing, 188 
Dynamic Explorer, 415 


E 


ellipses, 182, 260 
Erlang distribution, 158, 323 
error bars, 359 

examples, 386 
exponential distribution, 157 
extreme outliers 

box plots, 121 


F 


F distribution, 157 
far outside values 

box plots, 121 
Fechner's function, 5 
fences 

box plots, 121 
figure-ground separation, 8 
fill patterns, 375 

examples, 399 
fisheye projections, 339 
fit lines, 185 
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Fourier blobs, 264 
Fourier plots 
See Andrews’ Fourier plots 
frame options 
all axes settings, 438 
change background, 436 
changing coordinate, 434 
chart type, 434 
framed rectangular plots, 264 
frames, 419 
realigning, 420 
frequency polygons, 113, 120, 259 
examples, 132, 141 
function plots 
commands, 321 
contour plots, 320 
examples, 321, 322, 325, 326, 327, 329, 330, 333 
mosaic plots, 321 
overview, 319 
fuzzygram, 113, 120, 259 
examples, 138 


G 


gamma distribution, 157 
gap histograms, 113, 120, 259 
examples, 132, 141 
geometric distribution, 157 
Gestalt psychology, 8 
gnomonic projections, 339, 361, 362 
Gompertz distribution, 157 
gradient options, 461 
wireframe, 461 
graph editing, 411, 419 
annotation tools, 416, 417 
axis, 440 
case labels, 441 
Dynamic Explorer, 415 
examples, 467, 470 
fill patterns, 465, 467 
layout, 428, 436 
legends, 446 
selecting points, 416 
symbols, 466 
titles, 425, 433 


Index 


Graph editor, 411 
Graph view, 413 
Page view, 414 
graph gallery 
bitmaps, 491, 497, 499 
command files, 491 
creating graphs, 490 
creating items, 497 
customization, 496 
default icons, 491 
deleting items, 499 
distorted images, 497 
folders, 496 
generalizing, 500 
grouping items, 496 
items without graphs, 500 
modifying items, 499 
organization of, 490 
overview, 489 
profiles, 497 
required files, 491 
saving images for use in, 497 
subgalleries, 496 
supplied items, 491 
token processing, 491 
graph layout 
examples, 394, 395 
graph properties, 421 
Graph view, 412, 413 
graphic design, 12 
graphs 
annotating, 416, 417 
axes, 363, 415 
axis labels, 366, 441 
case labels, 257 
character size, 380, 408 
colors, 372 
commands, 384 
coordinates, 361 
Dynamic Explorer, 415 
editing, 411, 414, 415, 416, 417, 419 
fill patterns, 372 
font, 380 
frames, 419 
global features, 357, 380 
Graph editor, 411 
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Index 
identifying individual points, 416 commands, 268 
interactivity, 421 examples, 290, 293, 295, 297, 299, 300, 301, 302, 
labels, 379, 380 303 
layout, 428, 436 icon location, 266 
legends, 371 icon types, 262 
line thickness, 380 ordering, 265 
lines, 376 overview, 249 
local options, 357 standardizing data, 254 
margins, 414 iconic memory, 2 
panes, 419 in place editing 
positioning, 414 examples, 486 
reach 419 influence plots, 161, 182 
resizing, 413 А " 
rotating 3-D graphs, 415 interaction plots, 492 
scales, 363 interquartile range 
scaling, 380 box plots, 121 
surfaces, 376 Inverse Guassian distribution, 157 
symbols, 379 inverse smoothing, 188 
titles, 369, 425, 433 
tooltips, 419 J 
grid cuts, 376 ress pe e 
Gumbel distribution, 157 Їн docens 
jittered dot density plots, 113, 125, 126 
H examples, 138, 141 
hex grid cuts, 163, 184, 456 K 


hexagonal binning, 163, 184, 455 
high-low-close plots, 191 
hinges, 121 


kernel curves, 127 
kernels, 161, 162, 182, 260 
Kriging smoothing, 189 


histogram, 494 
histograms, 117, 259 L 
cumulative, 119 
dot histograms, 126 labels, 257 
examples, 132, 139, 141, 148 editing in graphs, 441 
gap, 120 pyramid charts, 45 
icons, 264 Lambert cylindrical projections, 339, 361, 362 
options, 119 landscape orientation, 413 
overview, 113 lasso selection, 416 
horizontal bar charts, 494 legend options 
Huber smoothing, 189 examples, 483 
hulls, 161, 162, 182 labels, 372, 447 
hypergeometric distribution, 157 layout, 447 
location, 372 
I title, 372, 447 
legends, 371 


icon plots 
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examples, 397 
repositioning, 413 
line charts 
commands, 48 
converting to another chart, 426, 434 
dual displays, 35 
error bars, 359 
examples, 49, 53, 55, 58, 61, 64, 69, 71, 75, 81, 
90, 110, 309 
grouping variables, 35 
median line charts, 36 
multiplots, 267 
overlaying, 35 
overview, 23 
percentages, 36 
pseudo 3D line charts, 37 
repeated trials, 35 
line charts:pseudo perspective, 16 
linear regression, 495 
linear smoothing, 188 
lines, 376 
examples, 400 
log smoothing, 188 
log transformations, 500 
logarithmic series distribution, 158, 323 
logistic distribution, 157 
logit normal, 157 
logit normal distribution, 323 
loglogistic distribution, 158, 323 
lognormal distribution, 323 
long-term memory, 2 
LOWESS smoothing, 188 


M 


magnitude of sensation:versus stimulus intensity, 5 
maps, 337 

case labels, 257 

examples, 303, 346, 347, 348, 352, 354, 355 

files, 340, 342, 343, 344 

islands, 343 

projections, 340 

sources of map files, 345 

spherical., 340 


Index 


matrix columns, 181 
mean smoothing, 188 
median, 33, 36, 45, 121 
median smoothing, 188 
memorization, 2 
Mercator conformal projections, 339, 361 
midrange smoothing, 188 
Miller cylindrical projections, 339, 361 
minimum spanning tree, 161, 162, 182 
mirror displays 
See dual displays 
mode smoothing, 188 
mosaic plots, 129, 180, 321 
Muller-Lyer illusion, 7 
multiplots, 30, 155, 157, 180, 267 
examples, 305, 306, 309, 310, 311, 313, 314 
multivariate displays 
overview, 249 


N 


negative binomial distribution, 323 

negative exponentially weighted smoothing, 188 
NEXPO smoothing, 188 

non-central chi-square distribution, 158, 323 
non-central F distribution, 158, 323 

non-central t distribution, 158, 323 

normal curves, 127 

normal distribution, 157 

normal probability plots, 155, 157 

notched box plots, 124 


о 
oblique projections, 339, 361, 362 
orientation 

of pages, 411 
orthographic projections, 339 
outliers, 495 

box plots, 121 


outside values 
box plots, 121 


P- P plots, 157 
Page view, 414 
panes, 419 
panning, 413, 419 
parallel coordinate displays 
case labels, 257 
commands, 268 
examples, 269 
overview, 249 
setup, 254 
standardizing data, 254 
Pareto distribution, 157 
perspective pie charts, 17 
Peters projections, 339, 362 
pie charts 
attention maps, 47 
commands, 48 
examples, 49, 61, 64, 97, 100 
grouping variables, 46 
overview, 23 
pseudo 3D, 47 
scales, 47 
slices, 46, 48 
transformations, 48 
pie charts:3-D, 17 
pip marks 
editing in graphs, 442 
Poggendorf illusion, 7 
Poisson, 157 
Poisson distribution, 157 
polar coordinates, 361 
Ponzo illusion, 7 
portrait orientation, 413 
power law, 4 
power smoothing, 188 
probability plots, 153, 157 
commands, 163 
distributions, 158 


examples, 167, 170, 171, 172, 174 


multiplots, 160, 267 
options, 161 
overview, 153 


P-P plots, 158 
probability scale, 157 
profile charts, 37 
commands, 48 
converting to another chart, 426 
dual displays, 39 


examples, 49, 53, 55, 58, 61, 64, 69, 71, 75, 90, 
110 


grouping variables, 38 
median profile charts, 41 
multiplots, 37, 39, 267 
overlaying, 39 
overview, 23 
percentages, 40 
pseudo 3-D depth, 41 
repeated trials, 39 
stack, 39 
profile plots, 264 
projections 
for graphs, 361 
pseudo perspective bar charts, 16 
pseudo perspective line graphs, 16 
pyramid charts, 41 
anchored, 45 
command, 48 
dual displays, 43 


le icm 49, 53, 55, 58, 61, 64, 69, 71, 75, 96, 


grouping variables, 42 
labels, 45 

median pyramid charts, 45 
multiplots, 41, 43, 267 
overlaying, 43 

overview, 23 

percentages, 44 

pseudo 3D, 45 

pyramid width, 45 
repeated trials, 43 


Q 
Q-Q plots, 155 


quadratic smoothing, 188 


quantile plots, 153, 155 
commands, 163 
examples, 163, 165, 166 
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multiplots, 157, 267 

one-sample, 155 

options, 161 

overview, 153 

two-sample, 155 
quantitative graphics, 1 
quartiles 

box plots, 121 


R 

Rayleigh distribution, 157 

realign frames, 419 

rectangular coordinates, 361 

region selection, 416 

repeated measures, 122, 126, 180 
residuals, 190 

Robinson projections, 339, 361, 362 


S 


scales, 363 
editing in graphs, 439 
scatterplot matrices, 249, 258 
commands, 268 
diagonals, 260 
examples, 271, 274, 277, 278, 279, 280, 282, 283, 
284, 285, 286, 288, 313 
options, 260 
overview, 249 
smoothers, 185 
scatterplots, 492, 495 
commands, 193 
examples, 193, 196, 197, 198, 201, 204, 205, 206, 
207, 210, 211, 212, 213, 216, 219, 221, 223, 
224, 227, 228, 230, 231, 232, 233, 234, 235, 
236, 238, 240, 241, 243, 244, 245, 305, 306, 
309, 311 
multiplots, 181, 267 
options, 182 
overview, 177 
residuals, 190 
smoothers, 185 
select cases 
in graphs, 416 
separate slice, 46, 453 


Index 


short-term memory, 2 

show selection, 416 

sinusoidal projections, 339, 361, 362 
slice labels, 48 

smallest extreme value distribution, 158, 323 
smoothers, 185 

spherical coordinates, 361 

spikes, 161, 182 

spline smoothing, 188 

SPLOM's, 249, 258 

stacked bar charts, 495 


standard deviation 
error bars, 359 


standard error 
error bars, 359 
standardized residuals, 190 
standardizing data, 254 
star plots, 263 
step smoothing, 188 
stereographic projections, 339, 361 
Stevens function, 5 
stimulus intensity:versus magnitude of sensation, 5 
Studentized distribution, 323 
Sui maximum modulus distribution, 158, 
Studentized range distribution, 157 
sun plots, 264 
surface smoothing, 186 
surfaces, 376 
SYC, 491 
symbols 
colors, 372 
examples, 399, 402, 404, 405 
size, 380 


type, 379 
symmetric dot displays, 113, 125, 126 
examples, 141, 145 


F 


t distribution, 157 
tension, 131, 187 
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Index 


text tool, 417 
thermometer plots, 264 
3-D graphs, 127, 129, 180, 361 
3-D smoothing, 186 
three-dimensional graphs, 180, 361 
tick marks, 363, 365 
editing in graphs, 442 
titles, 425, 433 
transformations, 415 
transposing axes, 363 
traveling salesman path, 162, 183 
examples, 236 
Trellis displays, 27, 30, 37, 41, 155, 157, 180, 267 
triangular coordinates, 361 
triangular distribution, 323 
trimmed smoothing, 189 


U 


uniform distribution, 157 
units of measurement, 380 


V 


vector lines, 161, 182 

vertical spikes, 161, 182, 261 
visual illusions, 7 

visual information processing, 2 
Voronoi tessellation, 161, 182, 262 


Ww 


waveform plots, 252 
weather vanes plots, 264 
Weibull distribution, 157 
whiskers, 121 


Z 


zipf distribution, 157 
zooming, 413, 419 
zoom in, 419 
zoom out, 419 
zoom selection, 419 


